hardware
2× RTX 5060 Ti 16GB
60GB DDR4-2133 ECC
llama.cpp
$

speed — tokens per second

Best of 3 runs per test.

Model Arch Quant Ctx File Gen tok/s PP tok/s Load PinchBench
$

generation speed comparison

generation tok/s

$

vram — fit map

Model weight sizes across available models. 32GB total VRAM across both GPUs.

model weight size

$

quality — benchmark reference

Cloud API scores from public leaderboards. Local quant scores where tested.

Model PinchBench (cloud) PinchBench (local) Quant delta Source

$

about

Every number measured on the hardware above. No cloud APIs, no theoretical maximums. More models and benchmarks added as they're tested.