MMLU Benchmark
Massive Multitask Language Understanding. Max score: 100.
| Model | Score | |
|---|---|---|
| DeepSeek R1 | 90.8 | |
| Claude Opus 4.6 | 90.0 | |
| Claude Opus 4.5 | 89.5 | |
| Claude Sonnet 4.5 | 89.0 | |
| Gemini 2.5 Pro | 88.5 | |
| GPT-4.1 | 88.0 | |
| GPT-4o | 87.2 | |
| DeepSeek V3 | 87.0 |
Massive Multitask Language Understanding. Max score: 100.
| Model | Score | |
|---|---|---|
| DeepSeek R1 | 90.8 | |
| Claude Opus 4.6 | 90.0 | |
| Claude Opus 4.5 | 89.5 | |
| Claude Sonnet 4.5 | 89.0 | |
| Gemini 2.5 Pro | 88.5 | |
| GPT-4.1 | 88.0 | |
| GPT-4o | 87.2 | |
| DeepSeek V3 | 87.0 |