Coding Benchmarks

Models: —

Prompts: 20

Scoring: execution-based

Loading…

Leaderboard ranked by weighted composite score

P Pure Functions (×1.0) Q Data Manipulation (×1.0) R Algorithms (×1.5) S Bug Fixing (×1.5) T One-Shot Tasks (×2.0)

#	Model	Grade	Score	P	Q	R	S	T	TG128	Size
{ } Loading benchmark data…