Models:
Prompts: 20
Scoring: execution-based
Loading…
Leaderboard ranked by weighted composite score
P Pure Functions (×1.0) Q Data Manipulation (×1.0) R Algorithms (×1.5) S Bug Fixing (×1.5) T One-Shot Tasks (×2.0)
# Model Grade Score P Q R S T TG128 Size
{ }
Loading benchmark data…