18 benchmarks - the world's most-followed benchmarks, curated by AI Explained, author of SimpleBench
Independently-run benchmarks by Epoch, Scale and others, so may not match self-reported scores by AI orgs.
| Models (no tools) | Score | |
|---|---|---|
| 1 | Gemini 3.1 Pro Preview (high thinking) | 46.4% ±2.0 |
| 2 | GPT-5.4 Pro | 44.3% ±2.0 |
| 3 | Muse Spark | 40.6% ±1.9 |
| 4 | Gemini 3 Pro Preview | 37.5% ±1.9 |
| 5 | GPT-5.4 (xhigh) | 36.2% ±1.9 |
| Model | Score | |
|---|---|---|
| 1 | Claude Fable 5 | 81.9% |
| 2 | Gemini 3.1 Pro Preview | 79.6% |
| 3 | GPT-5.5 Pro | 76.9% |
| 4 | Gemini 3.5 Flash | 76.7% |
| 5 | Gemini 3 Pro Preview | 76.4% |
| Model | Minutes | |
|---|---|---|
| 1 | Claude Mythos Preview | 1044.8 |
| 2 | Claude Opus 4.6 (unknown thinking) | 718.8 |
| 3 | Gemini 3.1 Pro Preview | 384.1 |
| 4 | GPT-5.2 (high) | 352.2 |
| 5 | GPT-5.3 Codex | 349.5 |
| Model | Score | |
|---|---|---|
| 1 | Claude Opus 4.7 (max) | 83.5% ±1.7 |
| 2 | GPT-5.5 (xhigh) | 80.6% ±1.8 |
| 3 | Gemini 3.5 Flash (high) | 79.3% ±1.8 |
| 4 | Claude Opus 4.6 (no thinking) | 78.7% ±1.9 |
| 5 | GPT-5.4 (high) | 76.9% ±1.9 |
| Model | Score | |
|---|---|---|
| 1 | GPT-5.4 Pro (xhigh) | 94.6% ±1.6 |
| 2 | Gemini 3.1 Pro Preview | 94.1% ±1.7 |
| 3 | GPT-5.5 (xhigh) | 94.0% ±1.5 |
| 4 | GPT-5.5 Pro (xhigh) | 93.9% ±1.6 |
| 5 | GPT-5.4 (xhigh) | 93.3% ±1.8 |
| Model | Score | |
|---|---|---|
| 1 | GPT-5.2 | 49.7% |
| 2 | Claude Opus 4.5 | 45.5% |
| 3 | Claude Opus 4.1 | 43.6% |
| 4 | Claude Sonnet 4.5 | 42.5% |
| 5 | Gemini 3 Pro Preview | 40.3% |
| Model | Score | |
|---|---|---|
| 1 | Claude Opus 4.7 | 1566.9 |
| 2 | Claude Opus 4.6 | 1556.3 |
| 3 | Claude Opus 4.8 | 1552.2 |
| 4 | Qwen3.7-Max | 1540.8 |
| 5 | GLM-5.1 | 1534.0 |
| Model | Score | |
|---|---|---|
| 1 | Claude Opus 4.7 | 44.1% |
| 2 | Claude Opus 4.6 (high) | 41.2% |
| 3 | GPT-5.5 (xhigh) | 40.2% |
| 4 | GPT-5.4 (xhigh) | 31.4% |
| 5 | GPT-5.2 (high) | 27.4% |
| Model | Score | |
|---|---|---|
| 1 | o3 (medium) | 100.0% |
| 2 | GPT-5 (medium) | 96.9% |
| 3 | Grok 4 | 96.9% |
| 4 | Gemini 2.5 Pro Exp (Mar '25) | 90.6% |
| 5 | o3-pro | 88.9% |
| Model | Score | |
|---|---|---|
| 1 | Gemini 3 Pro Preview | 58.1% ±2.1 |
| 2 | Gemini 3.1 Pro Preview | 57.0% ±2.0 |
| 3 | Gemini 3 Flash | 48.1% ±2.4 |
| 4 | Grok 4 | 43.6% ±2.2 |
| 5 | Claude Opus 4.5 | 43.5% ±2.3 |
| Model | Score | |
|---|---|---|
| 1 | GPT-5.5 Pro (xhigh) | 100.0% ±0.0 |
| 2 | GPT-5.5 (xhigh) | 100.0% ±0.0 |
| 3 | Claude Fable 5 (max) | 99.7% ±0.3 |
| 4 | Claude Opus 4.8 | 98.3% ±1.4 |
| 5 | Claude Opus 4.7 (xhigh) | 97.8% ±2.2 |
| Model | Score | |
|---|---|---|
| 1 | GPT-5 (high) | 98.1% ±0.3 |
| 2 | GPT-5 (medium) | 97.9% ±0.3 |
| 3 | GPT-5 mini (high) | 97.8% ±0.3 |
| 4 | o4-mini (high) | 97.8% ±0.3 |
| 5 | o3 (high) | 97.8% ±0.3 |
| Model | Score | |
|---|---|---|
| 1 | GPT-5.5 Pro (xhigh) | 87.7% ±1.9 |
| 2 | Claude Fable 5 (max) | 87.0% ±2.0 |
| 3 | GPT-5.5 (xhigh) | 85.3% ±2.1 |
| 4 | Claude Opus 4.8 | 80.0% ±2.4 |
| 5 | GPT-5.4 (xhigh) | 78.6% ±2.4 |
| Model | Score | |
|---|---|---|
| 1 | Claude Fable 5 (max) | 87.8% ±5.2 |
| 2 | GPT-5.5 Pro (xhigh) | 78.0% ±6.5 |
| 3 | AI co-mathematician | 75.6% ±6.7 |
| 4 | GPT-5.5 (xhigh) | 72.5% ±7.1 |
| 5 | Claude Opus 4.8 | 56.1% ±7.8 |
| Model | Score | |
|---|---|---|
| 1 | Claude Fable 5 (high) | 87.9% |
| 2 | GPT-5.5 (xhigh) | 84.9% |
| 3 | Claude Opus 4.8 (xhigh) | 82.9% |
| 4 | GPT-5.3 Codex | 79.3% |
| 5 | Claude Opus 4.6 (high) | 78.0% |
| Model | Score | |
|---|---|---|
| 1 | Claude Opus 4.7 | 90.2% ±2.1 |
| 2 | GPT-5.5 | 84.7% ±2.1 |
| 3 | GPT-5.4 | 81.8% ±2.0 |
| 4 | Gemini 3.1 Pro Preview | 80.2% ±2.6 |
| 5 | Claude Opus 4.6 | 79.8% ±1.6 |
| Model | Score | |
|---|---|---|
| 1 | Gemini 3 Pro Preview | 91.0% |
| 2 | GPT-5.2 (xhigh) | 84.0% |
| 3 | Gemini 3 Flash | 72.6% |
| 4 | GPT-5.2 (high) | 67.0% |
| 5 | GPT-5 (high) | 66.0% |
| Model | Score | |
|---|---|---|
| 1 | Gemini 3 Pro Preview | 3893 |
| 2 | Gemini 2.5 Pro Preview (May '25) | 3836 |
| 3 | o3 (high) | 3789 |
| 4 | Gemini 2.0 Flash (Feb '25) | 3659 |
| 5 | GPT-5 (medium) | 3498 |