18 benchmarks - the world's most-followed benchmarks, curated by AI Explained, author of SimpleBench
Independently-run benchmarks by Epoch, Scale and others, so may not match self-reported scores by AI orgs.
| Models (no tools) | Score | |
|---|---|---|
| 1 | Gemini 3 Pro Preview | 37.52% ±1.90 |
| 2 | Claude Opus 4.6 (max) | 34.44% ±1.86 |
| 3 | GPT-5 Pro | 31.64% ±1.82 |
| 4 | GPT-5.2 | 27.80% ±1.76 |
| 5 | GPT-5 (August '25) | 25.32% ±1.70 |
| Model | Score | |
|---|---|---|
| 1 | Gemini 3.1 Pro Preview | 79.6% |
| 2 | Gemini 3 Pro Preview | 76.4% |
| 3 | GPT-5.4 Pro | 74.1% |
| 4 | Claude Opus 4.6 | 67.6% |
| 5 | Gemini 2.5 Pro (06-05) | 62.4% |
| Model | Minutes | |
|---|---|---|
| 1 | Claude Opus 4.5 (16k thinking) | 288.9 ±558.2 |
| 2 | GPT-5 (medium) | 137.3 ±102.1 |
| 3 | Claude Sonnet 4.5 | 113.3 ±91.4 |
| 4 | Grok 4 | 110.1 ±91.8 |
| 5 | Claude Opus 4.1 | 105.5 ±69.2 |
| Model | Score | |
|---|---|---|
| 1 | Claude Opus 4.6 | 78.7% ±1.9 |
| 2 | GPT-5.4 (high) | 76.9% ±1.9 |
| 3 | Claude Opus 4.5 | 76.7% ±1.9 |
| 4 | Gemini 3.1 Pro Preview | 75.6% ±2.0 |
| 5 | Gemini 3 Flash | 75.4% ±2.0 |
| Model | Score | |
|---|---|---|
| 1 | Gemini 3.1 Pro Preview | 94.1% ±1.7 |
| 2 | Gemini 3 Pro Preview | 92.6% ±1.7 |
| 3 | GPT-5.2 (xhigh) | 91.4% ±1.8 |
| 4 | Claude Opus 4.6 (32k thinking) | 90.5% ±1.7 |
| 5 | Claude Opus 4.6 (64k thinking) | 88.8% ±1.9 |
| Model | Score | |
|---|---|---|
| 1 | GPT-5.4 | 83.0% |
| 2 | GPT-5.3 Codex | 70.9% |
| 3 | GPT-5.2 | 70.9% |
| 4 | Claude Opus 4.5 | 59.6% |
| 5 | Gemini 3 Pro Preview | 53.5% |
| Model | Score | |
|---|---|---|
| 1 | GPT-5.2 (high) | 27.4% |
| 2 | Claude Opus 4.5 (no-thinking) | 26.5% |
| 3 | Gemini 3 Pro Preview | 18.6% |
| 4 | Gemini 3 Flash | 9.8% |
| 5 | o3 (high) | 8.8% |
| Model | Score | |
|---|---|---|
| 1 | o3 (medium) | 100.0% |
| 2 | Grok 4 | 96.9% |
| 3 | GPT-5 (medium) | 96.9% |
| 4 | Gemini 2.5 Pro Exp (Mar '25) | 90.6% |
| 5 | o3-pro | 88.9% |
| Model | Score | |
|---|---|---|
| 1 | Claude Opus 4.5 (32k thinking) | 1512 |
| 2 | GPT-5.2 (high) | 1480 |
| 3 | Claude Opus 4.5 (no-thinking) | 1479 |
| 4 | GPT-5 (high) | 1477.5 |
| 5 | Claude Opus 4.1 | 1472.4 |
| Model | Score | |
|---|---|---|
| 1 | Gemini 3 Flash | 48.1% ±2.4 |
| 2 | Grok 4 | 43.6% ±2.2 |
| 3 | Gemini 2.5 Pro Exp (Mar '25) | 40.4% ±2.3 |
| 4 | DeepSeek-R1 | 34.9% ±2.2 |
| 5 | Gemini 2.5 Flash | 33.5% ±2.1 |
| Model | Score | |
|---|---|---|
| 1 | GPT-5.2 (xhigh) | 96.1% ±2.7 |
| 2 | GPT-5.2 (high) | 96.1% ±2.6 |
| 3 | Gemini 3.1 Pro Preview | 95.6% ±3.1 |
| 4 | Claude Opus 4.6 (64k thinking) | 94.4% ±2.8 |
| 5 | GPT-5.2 (medium) | 93.9% ±3.1 |
| Model | Score | |
|---|---|---|
| 1 | GPT-5 (high) | 98.1% ±0.3 |
| 2 | GPT-5 (medium) | 97.9% ±0.3 |
| 3 | o4-mini (high) | 97.8% ±0.3 |
| 4 | o3 (high) | 97.8% ±0.3 |
| 5 | Claude Sonnet 4.5 | 97.7% ±0.4 |
| Model | Score | |
|---|---|---|
| 1 | GPT-5.4 Pro (xhigh) | 50.0% ±2.9 |
| 2 | GPT-5.4 (xhigh) | 47.6% ±2.9 |
| 3 | Claude Opus 4.6 (max) | 40.7% ±2.9 |
| 4 | GPT-5.2 (xhigh) | 40.7% ±2.9 |
| 5 | GPT-5.2 (high) | 40.3% ±2.9 |
| Model | Score | |
|---|---|---|
| 1 | GPT-5.3 Codex | 79.3% |
| 2 | GPT-5.2 (xhigh) | 72.2% |
| 3 | Gemini 3.1 Pro Preview | 72.1% |
| 4 | Gemini 3 Pro Preview | 69.9% |
| 5 | Claude Opus 4.6 (no thinking) | 65.9% |
| Model | Score | |
|---|---|---|
| 1 | Gemini 3.1 Pro Preview | 78.4% |
| 2 | GPT-5.3 Codex | 77.3% |
| 3 | GPT-5.3 Codex | 75.1% |
| 4 | Claude Opus 4.6 (no thinking) | 69.9% |
| 5 | GPT-5.2 (medium) | 64.9% |
| Model | Score | |
|---|---|---|
| 1 | Gemini 3 Pro Preview | 91.0% |
| 2 | GPT-5.2 (xhigh) | 84.0% |
| 3 | Gemini 3 Flash | 72.6% |
| 4 | GPT-5.2 (high) | 67.0% |
| 5 | GPT-5 (high) | 66.0% |
| Model | Score | |
|---|---|---|
| 1 | Claude 3.7 Sonnet | 29.1 |
| 2 | Claude 3.5 Sonnet (Jun '24) | 28.1 |
| 3 | Gemini 2.5 Pro Exp (Mar '25) | 18.4 |
| 4 | GPT-4o (Nov '24) | 16.6 |
| 5 | DeepSeek-V3 | 15.1 |
| Model | Score | |
|---|---|---|
| 1 | Gemini 3 Pro Preview | 3893 |
| 2 | Gemini 2.0 Flash Thinking Exp | 3873 |
| 3 | Gemini 2.5 Pro Exp (Mar '25) | 3871 |
| 4 | Gemini 2.5 Pro Preview (Jun '25) | 3836 |
| 5 | o3 (high) | 3789 |