20 benchmarks - the world's most-followed benchmarks, curated by AI Explained, author of SimpleBench
Independently-run benchmarks by Epoch, Scale and others, so may not match self-reported scores by AI orgs.
| Models (no tools) | Score | |
|---|---|---|
| 1 | GPT-5 (August '25) | 25.32% ±1.70 |
| 2 | Kimi K2 Thinking | 23.9% ±1.61 |
| 3 | Gemini 2.5 Pro Preview (June '25) | 21.64% ±1.61 |
| 4 | o3 (high) (April '25) | 20.32% ±1.58 |
| 5 | GPT-5 Mini (August '25) | 19.44% ±1.55 |
| Model | Score | |
|---|---|---|
| 1 | Gemini 2.5 Pro Preview (Jun '25) | 62.4% |
| 2 | GPT-5 Pro | 61.6% |
| 3 | Grok 4 | 60.5% |
| 4 | Claude Opus 4.1 | 60.0% |
| 5 | Claude Opus 4 | 58.8% |
| Model | Minutes | |
|---|---|---|
| 1 | GPT-5 (medium) | 137.3 ±102.1 |
| 2 | Claude Sonnet 4.5 | 113.3 ±91.4 |
| 3 | Grok 4 | 110.1 ±91.8 |
| 4 | Claude Opus 4.1 | 105.5 ±69.2 |
| 5 | o3 (medium) | 91.3 ±58.8 |
| Model | Score | |
|---|---|---|
| 1 | Claude Sonnet 4.5 | 64.8% ±2.1 |
| 2 | Claude Opus 4.1 | 63.2% ±2.2 |
| 3 | Claude Opus 4 | 62.2% ±2.2 |
| 4 | Claude Sonnet 4 | 60.6% ±2.2 |
| 5 | Claude Haiku 4.5 | 60.6% ±2.2 |
| Model | Score | |
|---|---|---|
| 1 | Grok 4 | 87.0% ±2.0 |
| 2 | GPT-5 (high) | 86.2% ±2.1 |
| 3 | GPT-5 (medium) | 85.4% ±2.1 |
| 4 | Gemini 2.5 Pro Preview (Jun '25) | 84.8% ±2.6 |
| 5 | Gemini 2.5 Pro Exp (Mar '25) | 83.8% ±2.6 |
| Model | Score | |
|---|---|---|
| 1 | Claude Opus 4.1 | 43.6% |
| 2 | GPT-5 (high) | 34.8% |
| 3 | GPT-5 (medium) | 33.9% |
| 4 | GPT-5 (low) | 31.9% |
| 5 | o3 (high) | 30.8% |
| Model | Score | |
|---|---|---|
| 1 | o3 (high) | 8.8% |
| 2 | Claude Opus 4 | 6.9% |
| 3 | GPT-5 (high) | 6.9% |
| 4 | Claude Sonnet 4 | 4.9% |
| 5 | Kimi K2 | 4.9% |
| Model | Score | |
|---|---|---|
| 1 | Claude Sonnet 4.5 | 57.7% |
| 2 | GPT-5 (low) | 57.4% |
| 3 | Claude Opus 4.1 | 56.4% |
| 4 | Claude Opus 4 | 56.3% |
| 5 | GPT-5 (medium) | 56.0% |
| Model | Score | |
|---|---|---|
| 1 | GPT-5 (high) | 1477.5 |
| 2 | Claude Opus 4.1 | 1472.4 |
| 3 | Claude Opus 4.1 | 1462.3 |
| 4 | Claude Sonnet 4.5 (32k thinking) | 1420.8 |
| 5 | Gemini 2.5 Pro | 1401.0 |
| Model | Score | |
|---|---|---|
| 1 | Grok 4 | 43.6% ±2.2 |
| 2 | Gemini 2.5 Pro Exp (Mar '25) | 40.4% ±2.3 |
| 3 | DeepSeek-R1 | 34.9% ±2.2 |
| 4 | Gemini 2.5 Flash | 33.5% ±2.1 |
| 5 | GPT-5 (minimal) | 32.8% ±2.2 |
| Model | Score | |
|---|---|---|
| 1 | GPT-5 (high) | 91.4% ±3.8 |
| 2 | GPT-5 (medium) | 87.2% ±3.9 |
| 3 | Grok 4 | 84.0% ±5.0 |
| 4 | o3 (high) | 83.9% ±4.4 |
| 5 | o4-mini (high) | 81.7% ±4.7 |
| Model | Score | |
|---|---|---|
| 1 | GPT-5 (high) | 98.1% ±0.3 |
| 2 | GPT-5 (medium) | 97.9% ±0.3 |
| 3 | o4-mini (high) | 97.8% ±0.3 |
| 4 | o3 (high) | 97.8% ±0.3 |
| 5 | Claude Sonnet 4.5 | 97.7% ±0.4 |
| Model | Score | |
|---|---|---|
| 1 | Gemini 2.5 Deep Think | 29.0% ±2.7 |
| 2 | GPT-5 (high) | 26.6% ±2.6 |
| 3 | GPT-5 (medium) | 24.8% ±2.5 |
| 4 | GPT-5 mini (high) | 19.7% ±2.3 |
| 5 | GPT-5 mini (medium) | 19.3% ±2.3 |
| Model | Score | |
|---|---|---|
| 1 | Gemini 2.5 Pro Exp (Mar '25) | 61.1% |
| 2 | GPT-4.5 Preview (Feb '25) | 60.3% |
| 3 | o4-mini (high) | 59.7% |
| 4 | o1 (high) | 59.5% |
| 5 | Claude 3.7 Sonnet (8k thinking) | 55.7% |
| Model | Score | |
|---|---|---|
| 1 | o3 (medium) | 100.0% |
| 2 | Grok 4 | 96.9% |
| 3 | GPT-5 (medium) | 96.9% |
| 4 | Gemini 2.5 Pro Exp (Mar '25) | 90.6% |
| 5 | o3-pro | 88.9% |
| Model | Score | |
|---|---|---|
| 1 | Claude Sonnet 4.5 (no thinking) | 60.3% |
| 2 | Claude Opus 4.1 | 58.8% |
| 3 | Claude Sonnet 4 | 54.8% |
| 4 | GPT-5 (medium) | 52.5% |
| 5 | Claude Opus 4 | 45.3% |
| Model | Score | |
|---|---|---|
| 1 | GPT-5 (high) | 66.0% |
| 2 | GPT-5 (medium) | 63.2% |
| 3 | o4-mini (medium) | 57.5% |
| 4 | o3 (medium) | 52.0% |
| 5 | Gemini 2.5 Pro Preview (Mar '25) | 48.0% |
| Model | Score | |
|---|---|---|
| 1 | Claude 3.7 Sonnet | 29.1 |
| 2 | Claude 3.5 Sonnet (Jun '24) | 28.1 |
| 3 | Gemini 2.5 Pro Exp (Mar '25) | 18.4 |
| 4 | GPT-4o (Nov '24) | 16.6 |
| 5 | DeepSeek-V3 | 15.1 |
| Model | Score | |
|---|---|---|
| 1 | Gemini 2.0 Flash Thinking Exp | 3873 |
| 2 | Gemini 2.5 Pro Exp (Mar '25) | 3871 |
| 3 | Gemini 2.5 Pro Preview (Jun '25) | 3836 |
| 4 | o3 (high) | 3789 |
| 5 | Gemini 2.0 Flash (Feb '25) | 3659 |
| Model | Score | |
|---|---|---|
| 1 | GPT-5 (high) | 88.0% |
| 2 | GPT-5 (medium) | 86.7% |
| 3 | o3-pro | 84.9% |
| 4 | Gemini 2.5 Pro Preview (Jun '25) | 83.1% |
| 5 | GPT-5 (low) | 81.3% |