zanzibar ~/llama.cpp/build/bin>./llama-bench --hf-repo unsloth/gemma-4-E4B-it-GGUF ggml_cuda_init: found 1 CUDA devices (Total VRAM: 11906 MiB): Device 0: NVIDIA RTX A2000 12GB, compute capability 8.6, VMM: yes, VRAM: 11906 MiB Downloading gemma-4-E4B-it-Q4_K_M.gguf ───────────────────────────── 100% | model | size | params | backend | ngl | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | gemma4 E4B Q4_K - Medium | 4.62 GiB | 7.52 B | CUDA | 99 | pp512 | 1831.78 ± 13.79 | | gemma4 E4B Q4_K - Medium | 4.62 GiB | 7.52 B | CUDA | 99 | tg128 | 51.92 ± 0.39 | build: 82764d8f4 (8770)