AI Price Index AIpricing
  • Models
  • Benchmarks
  • Partners
  • Offers

AI Benchmarks

Compare AI model performance across industry-standard benchmarks.

Sort: Name ▲ Models Tested Top Score

Arena ELO

Chatbot Arena crowdsourced ELO rating

27 models tested Top: Claude Opus 4.6 (1504.0)

HumanEval

Code generation benchmark

4 models tested Top: o3 (95.2)

MATH

Competition mathematics

3 models tested Top: DeepSeek R1 (97.3)

MMLU

Massive Multitask Language Understanding

8 models tested Top: DeepSeek R1 (90.8)

aipriceindex.com — Compare AI infrastructure costs

Models Benchmarks Partners Offers
Best AI Provider Cheapest AI Provider Top AI Provider Best AI Model AI Model Leaderboard Most Popular AI Models Cheapest LLM Best LLM
Imprint Terms Privacy Security