HumanEval Benchmark
Code generation benchmark. Max score: 100.
| Model | Score | |
|---|---|---|
| o3 | 95.2 | |
| Claude Opus 4.6 | 94.0 | |
| o4 mini | 93.4 | |
| Claude Sonnet 4.5 | 93.0 |
Code generation benchmark. Max score: 100.
| Model | Score | |
|---|---|---|
| o3 | 95.2 | |
| Claude Opus 4.6 | 94.0 | |
| o4 mini | 93.4 | |
| Claude Sonnet 4.5 | 93.0 |