$
speed — tokens per second
Best of 3 runs per test.
| Model ↕ | Arch | Quant | Ctx ↕ | File | Gen tok/s ↕ | PP tok/s ↕ | Load | PinchBench |
|---|
$
generation speed comparison
generation tok/s
$
vram — fit map
Model weight sizes across available models. 32GB total VRAM across both GPUs.
model weight size
$
quality — benchmark reference
Cloud API scores from public leaderboards. Local quant scores where tested.
| Model | PinchBench (cloud) | PinchBench (local) | Quant delta | Source |
|---|
$
about
Every number measured on the hardware above. No cloud APIs, no theoretical maximums. More models and benchmarks added as they're tested.