18 benchmarks - the world's most-followed benchmarks, curated by AI Explained, author of SimpleBench
Independently-run benchmarks by Epoch, Scale and others, so may not match self-reported scores by AI orgs.
| Models (no tools) | Score | |
|---|---|---|
| 1 | Gemini 3.1 Pro Preview | 44.7% |
| 2 | GPT-5.5 (xhigh) | 44.3% |
| 3 | GPT-5.5 (high) | 43.0% |
| 4 | GPT-5.4 (xhigh) | 41.6% |
| 5 | GPT-5.5 (medium) | 40.6% |
| Model | Score | |
|---|---|---|
| 1 | Gemini 3.1 Pro Preview | 79.6% |
| 2 | GPT-5.5 Pro | 76.9% |
| 3 | Gemini 3 Pro Preview | 76.4% |
| 4 | GPT-5.4 Pro | 74.1% |
| 5 | GPT-5.5 | 69.0% |
| Model | Minutes | |
|---|---|---|
| 1 | Claude Opus 4.6 (unknown thinking) | 718.8 ±1815.2 |
| 2 | GPT-5.2 (high) | 352.2 ±335.5 |
| 3 | GPT-5.3 Codex | 349.5 ±333.1 |
| 4 | Claude Opus 4.5 (no thinking) | 293.0 ±239.0 |
| 5 | Claude Opus 4.5 (16k thinking) | 288.9 ±558.2 |
| Model | Score | |
|---|---|---|
| 1 | Claude Opus 4.7 (max) | 83.5% ±1.7 |
| 2 | Claude Opus 4.6 (high) | 78.7% ±1.9 |
| 3 | GPT-5.4 (high) | 76.9% ±1.9 |
| 4 | Claude Opus 4.5 (no thinking) | 76.7% ±1.9 |
| 5 | Gemini 3.1 Pro Preview | 75.6% ±2.0 |
| Model | Score | |
|---|---|---|
| 1 | GPT-5.4 Pro (xhigh) | 94.6% ±1.6 |
| 2 | Gemini 3.1 Pro Preview | 94.1% ±1.7 |
| 3 | GPT-5.4 (xhigh) | 93.3% ±1.8 |
| 4 | Gemini 3 Pro Preview | 92.6% ±1.7 |
| 5 | GPT-5.2 (xhigh) | 91.4% ±1.8 |
| Model | Score | |
|---|---|---|
| 1 | GPT-5.4 | 83.0% |
| 2 | GPT-5.3 Codex | 70.9% |
| 3 | GPT-5.2 | 70.9% |
| 4 | Claude Opus 4.5 | 59.6% |
| 5 | Gemini 3 Pro Preview | 53.5% |
| Model | Score | |
|---|---|---|
| 1 | GPT-5.2 (high) | 27.4% |
| 2 | Claude Opus 4.5 (no-thinking) | 26.5% |
| 3 | Gemini 3 Pro Preview | 18.6% |
| 4 | Gemini 3 Flash | 9.8% |
| 5 | o3 (high) | 8.8% |
| Model | Score | |
|---|---|---|
| 1 | o3 (medium) | 100.0% |
| 2 | Grok 4 | 96.9% |
| 3 | GPT-5 (medium) | 96.9% |
| 4 | Gemini 2.5 Pro Exp (Mar '25) | 90.6% |
| 5 | o3-pro | 88.9% |
| Model | Score | |
|---|---|---|
| 1 | Claude Opus 4.5 (32k thinking) | 1512 |
| 2 | GPT-5.2 (high) | 1480 |
| 3 | Claude Opus 4.5 (no-thinking) | 1479 |
| 4 | GPT-5 (high) | 1477.5 |
| 5 | Claude Opus 4.1 | 1472.4 |
| Model | Score | |
|---|---|---|
| 1 | Gemini 3 Flash | 48.1% ±2.4 |
| 2 | Grok 4 | 43.6% ±2.2 |
| 3 | Gemini 2.5 Pro Exp (Mar '25) | 40.4% ±2.3 |
| 4 | DeepSeek-R1 | 34.9% ±2.2 |
| 5 | Gemini 2.5 Flash | 33.5% ±2.1 |
| Model | Score | |
|---|---|---|
| 1 | Claude Opus 4.7 (xhigh) | 97.8% ±2.2 |
| 2 | GPT-5.2 (high) | 96.1% ±2.6 |
| 3 | GPT-5.2 (xhigh) | 96.1% ±2.7 |
| 4 | Gemini 3.1 Pro Preview | 95.6% ±3.1 |
| 5 | GPT-5.4 (xhigh) | 95.3% ±3.2 |
| Model | Score | |
|---|---|---|
| 1 | GPT-5 (high) | 98.1% ±0.3 |
| 2 | GPT-5 (medium) | 97.9% ±0.3 |
| 3 | GPT-5 mini (high) | 97.8% ±0.3 |
| 4 | o4-mini (high) | 97.8% ±0.3 |
| 5 | o3 (high) | 97.8% ±0.3 |
| Model | Score | |
|---|---|---|
| 1 | GPT-5.4 Pro (xhigh) | 50.0% ±2.9 |
| 2 | GPT-5.4 (xhigh) | 47.6% ±2.9 |
| 3 | Claude Opus 4.6 (max) | 40.7% ±2.9 |
| 4 | GPT-5.2 (xhigh) | 40.7% ±2.9 |
| 5 | GPT-5.2 (high) | 40.3% ±2.9 |
| Model | Score | |
|---|---|---|
| 1 | GPT-5.3 Codex | 79.3% |
| 2 | GPT-5.2 (xhigh) | 72.2% |
| 3 | Gemini 3.1 Pro Preview | 72.1% |
| 4 | Gemini 3 Pro Preview | 69.9% |
| 5 | Claude Opus 4.6 (no thinking) | 65.9% |
| Model | Score | |
|---|---|---|
| 1 | Gemini 3.1 Pro Preview | 78.4% |
| 2 | GPT-5.3 Codex | 77.3% |
| 3 | GPT-5.3 Codex | 75.1% |
| 4 | Claude Opus 4.6 (no thinking) | 69.9% |
| 5 | GPT-5.2 (medium) | 64.9% |
| Model | Score | |
|---|---|---|
| 1 | Gemini 3 Pro Preview | 91.0% |
| 2 | GPT-5.2 (xhigh) | 84.0% |
| 3 | Gemini 3 Flash | 72.6% |
| 4 | GPT-5.2 (high) | 67.0% |
| 5 | GPT-5 (high) | 66.0% |
| Model | Score | |
|---|---|---|
| 1 | Claude 3.7 Sonnet | 29.1 |
| 2 | Claude 3.5 Sonnet (Jun '24) | 28.1 |
| 3 | Gemini 2.5 Pro Exp (Mar '25) | 18.4 |
| 4 | GPT-4o (Nov '24) | 16.6 |
| 5 | DeepSeek-V3 | 15.1 |
| Model | Score | |
|---|---|---|
| 1 | Gemini 3 Pro Preview | 3893 |
| 2 | Gemini 2.0 Flash Thinking Exp | 3873 |
| 3 | Gemini 2.5 Pro Exp (Mar '25) | 3871 |
| 4 | Gemini 2.5 Pro Preview (Jun '25) | 3836 |
| 5 | o3 (high) | 3789 |