Arena ELO Benchmark
Chatbot Arena crowdsourced ELO rating. Max score: 2000.
| Model | Score | |
|---|---|---|
| Claude Opus 4.6 | 1504.0 | |
| Claude Opus 4.5 | 1470.0 | |
| Claude Sonnet 4.6 | 1460.0 | |
| Claude Sonnet 4.5 | 1450.0 | |
| Gemini 2.5 Pro | 1449.0 | |
| Claude Opus 4 | 1446.0 | |
| o3 | 1440.0 | |
| Claude Sonnet 4 | 1420.0 | |
| Gemini 2.5 Flash | 1411.0 | |
| GPT-4.1 | 1410.0 | |
| Grok 3 | 1402.0 | |
| o4 mini | 1400.0 | |
| DeepSeek R1 | 1398.0 | |
| o1 | 1388.0 | |
| Claude Haiku 4.5 | 1380.0 | |
| Grok 3 mini | 1363.0 | |
| Gemini 2.0 Flash | 1361.0 | |
| DeepSeek V3 | 1358.0 | |
| GPT-4.1 mini | 1350.0 | |
| o3 mini | 1348.0 | |
| GPT-4o | 1346.0 | |
| Llama 4 Maverick | 1327.0 | |
| Claude Haiku 3.5 | 1323.0 | |
| Llama 4 Scout | 1322.0 | |
| GPT-4o mini | 1310.0 | |
| Mistral Large 3 | 1305.0 | |
| Mistral Small 3.2 | 1280.0 |