utumno ~/llama.cpp/build/bin>./llama-bench --hf-repo unsloth/gemma-4-E4B-it-GGUF ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.026 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: MTL0 ggml_metal_device_init: GPU family: MTLGPUFamilyApple7 (1007) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = false ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 103079.22 MB | model | size | params | backend | threads | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: | | gemma4 E4B Q4_K - Medium | 4.62 GiB | 7.52 B | MTL,BLAS | 16 | pp512 | 1172.93 ± 0.36 | | gemma4 E4B Q4_K - Medium | 4.62 GiB | 7.52 B | MTL,BLAS | 16 | tg128 | 69.73 ± 0.19 | build: ff5ef8278 (8763)