dgx1 ~/llama.cpp/build/bin>./llama-bench --hf-repo unsloth/gemma-4-E4B-it-GGUF ggml_cuda_init: found 1 CUDA devices (Total VRAM: 122502 MiB): Device 0: NVIDIA GB10, compute capability 12.1, VMM: yes, VRAM: 122502 MiB Downloading gemma-4-E4B-it-Q4_K_M.gguf ───────────────────────────── 100% | model | size | params | backend | ngl | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | gemma4 E4B Q4_K - Medium | 4.62 GiB | 7.52 B | CUDA | 99 | pp512 | 3633.84 ± 201.59 | | gemma4 E4B Q4_K - Medium | 4.62 GiB | 7.52 B | CUDA | 99 | tg128 | 59.42 ± 0.96 | build: ff5ef8278 (8763)