The AI landscape has been evolving at a breathtaking pace. Over the past two years – from August 2024 to April 2026 – a total of 347 LLMs (large language models) have been tracked across leaderboard snapshots on Arena AI (formerly known as LMArena and Chatbot Arena), one of the most popular community-driven evaluation platforms for AI models.
Arena AI employs a unique crowdsourced blind evaluation methodology: real users submit prompts and compare responses from two anonymous models side by side. Based on thousands of human votes, each model receives a score reflecting its relative quality in open-ended text-to-text tasks. A higher score means the model wins more head-to-head comparisons against its peers. As of April 2026, scores range from roughly 950 (early-generation models) to over 1500 (state-of-the-art flagships).
Dynamic Visualization: The Line Race
To bring this data to life, a line race animation has been created, showing how the top AI models jockey for position over time. As new models enter the arena and older ones fade, the dramatic shifts unfold in real time – from GPT-4o’s early dominance through the rise of Claude Opus 4.7 and Gemini 3 Pro at the top. The animation features achievement milestones (triggered when models cross key rating thresholds).
The full video with this dynamic visualization has been published on YouTube:
Models That Reached the Top 10
Out of 347 models evaluated, only 77 have ever cracked the top 10 in the overall ranking. The table below presents each of these models along with their current standing, peak performance, and when they were last seen among the very best.
| Current Rank & Model | Current Score | Best Score | Best Rank Achieved | Last Seen in Top 10 |
|---|---|---|---|---|
| 1. Claude Opus 4.7 Thinking | 1504.53 | 1504.53 | 1 (2026-04-17) | 2026-04-17 |
| 2. Claude Opus 4.6 Thinking | 1502.63 | 1506.96 | 1 (2026-04-14) | 2026-04-17 |
| 3. Claude Opus 4.7 | 1498.47 | 1498.47 | 3 (2026-04-17) | 2026-04-17 |
| 4. Claude Opus 4.6 | 1496.83 | 1505.14 | 1 (2026-03-11) | 2026-04-17 |
| 5. Muse Spark | 1495.88 | 1495.88 | 3 (2026-04-14) | 2026-04-17 |
| 6. Gemini 3.1 Pro Preview | 1492.25 | 1500.71 | 2 (2026-03-05) | 2026-04-17 |
| 7. Gemini 3 Pro | 1486.11 | 1502.16 | 1 (2026-01-29) | 2026-04-17 |
| 8. Grok 4.20 Beta 1 | 1485.01 | 1496.02 | 3 (2026-03-11) | 2026-04-17 |
| 9. GPT-5.4 High | 1481.63 | 1485.70 | 6 (2026-04-07) | 2026-04-17 |
| 10. Grok 4.20 Beta Reasoning (03-09) | 1479.81 | 1483.48 | 7 (2026-04-07) | 2026-04-17 |
| 11. GPT-5.2 Chat Latest (2026-02-10) | 1477.12 | 1502.50 | 3 (2026-02-17) | 2026-04-14 |
| 12. Grok 4.20 Multi-Agent Beta (03-09) | 1475.62 | 1478.97 | 9 (2026-04-07) | 2026-04-14 |
| 13. Gemini 3 Flash | 1474.02 | 1479.66 | 2 (2025-12-30) | 2026-04-07 |
| 14. Claude Opus 4.5 Thinking 32K (2025-11-01) | 1473.03 | 1473.90 | 3 (2025-12-15) | 2026-03-31 |
| 16. Grok 4.1 Thinking | 1469.85 | 1484.41 | 2 (2026-01-29) | 2026-03-11 |
| 17. Claude Opus 4.5 (2025-11-01) | 1468.76 | 1469.20 | 3 (2025-11-26) | 2026-03-06 |
| 21. Gemini 3 Flash (Thinking Minimal) | 1462.73 | 1464.05 | 7 (2026-01-29) | 2026-02-11 |
| 23. Grok 4.1 | 1460.51 | 1466.36 | 3 (2025-11-20) | 2026-02-11 |
| 25. GLM-5 | 1456.03 | 1470.37 | 8 (2026-02-10) | 2026-02-10 |
| 26. GPT-5.1 High | 1454.71 | 1460.58 | 4 (2025-11-20) | 2026-01-29 |
| 27. GPT-5.3 Chat Latest | 1454.34 | 1468.22 | 10 (2026-03-11) | 2026-03-11 |
| 28. Claude Sonnet 4.5 Thinking 32K (2025-09-29) | 1451.92 | 1453.01 | 1 (2025-10-03) | 2026-01-29 |
| 29. Claude Sonnet 4.5 (2025-09-29) | 1451.66 | 1452.74 | 5 (2025-11-09) | 2025-12-15 |
| 32. ERNIE 5.0 0110 | 1450.44 | 1453.72 | 9 (2026-01-29) | 2026-01-29 |
| 33. ERNIE 5.0 Preview 1203 | 1449.49 | 1450.56 | 9 (2025-12-21) | 2025-12-21 |
| 34. Claude Opus 4.1 Thinking 16K (2025-08-05) | 1448.85 | 1451.45 | 2 (2025-11-06) | 2026-01-09 |
| 35. Gemini 2.5 Pro | 1448.66 | 1466.64 | 1 (2025-11-09) | 2026-01-09 |
| 36. Claude Opus 4.1 (2025-08-05) | 1446.83 | 1462.10 | 2 (2025-08-07) | 2025-11-20 |
| 39. GPT-4.5 Preview (2025-02-27) | 1444.45 | 1444.88 | 1 (2025-03-25) | 2025-11-20 |
| 40. ChatGPT-4o Latest (2025-03-26) | 1443.19 | 1443.66 | 1 (2025-04-16) | 2025-11-09 |
| 45. GPT-5.1 | 1438.68 | 1440.92 | 9 (2025-11-16) | 2025-11-17 |
| 47. Qwen3 Max Preview | 1434.94 | 1435.12 | 8 (2025-09-30) | 2025-11-09 |
| 49. GPT-5 High | 1433.37 | 1481.37 | 1 (2025-08-18) | 2025-11-09 |
| 52. o3 (2025-04-16) | 1431.27 | 1454.32 | 1 (2025-06-18) | 2025-11-09 |
| 55. GPT-5 Chat | 1426.56 | 1429.60 | 8 (2025-09-08) | 2025-10-01 |
| 60. Claude Opus 4 Thinking 16K (2025-05-14) | 1423.85 | 1424.30 | 6 (2025-07-28) | 2025-09-18 |
| 61. Qwen3 235B-A22B Instruct 2507 | 1423.50 | 1432.93 | 5 (2025-08-04) | 2025-08-21 |
| 64. DeepSeek R1 0528 | 1421.98 | 1421.98 | 5 (2025-06-18) | 2025-08-04 |
| 65. Grok 4 Fast Chat | 1421.08 | 1424.78 | 10 (2025-09-30) | 2025-09-30 |
| 70. Kimi K2 Preview (07-11) | 1417.40 | 1421.29 | 6 (2025-07-25) | 2025-08-28 |
| 77. GPT-4.1 (2025-04-14) | 1413.36 | 1413.86 | 4 (2025-05-22) | 2025-07-15 |
| 78. Claude Opus 4 (2025-05-14) | 1412.22 | 1420.44 | 4 (2025-06-18) | 2025-08-01 |
| 79. Grok 3 Preview (02-24) | 1411.89 | 1413.32 | 2 (2025-03-25) | 2025-07-28 |
| 80. GLM-4.5 | 1411.16 | 1418.42 | 10 (2025-08-04) | 2025-08-04 |
| 81. Gemini 2.5 Flash | 1411.05 | 1417.54 | 6 (2025-07-07) | 2025-07-17 |
| 82. Grok 4 0709 | 1410.12 | 1436.78 | 5 (2025-07-28) | 2025-09-08 |
| 89. Qwen3 235B-A22B No Thinking | 1403.21 | 1403.21 | 10 (2025-07-07) | 2025-07-07 |
| 93. o1 (2024-12-17) | 1401.79 | 1402.44 | 1 (2025-02-27) | 2025-07-01 |
| 98. DeepSeek R1 | 1397.80 | 1398.20 | 2 (2025-02-27) | 2025-05-22 |
| 103. DeepSeek V3 0324 | 1395.24 | 1397.36 | 4 (2025-04-16) | 2025-06-18 |
| 107. o4 Mini (2025-04-16) | 1389.90 | 1400.07 | 6 (2025-05-11) | 2025-06-18 |
| 109. Claude Sonnet 4 (2025-05-14) | 1388.86 | 1395.42 | 7 (2025-06-11) | 2025-06-11 |
| 110. o1 Preview | 1387.97 | 1388.54 | 1 (2024-12-22) | 2025-04-16 |
| 114. Claude 3.7 Sonnet Thinking 32K (2025-02-19) | 1386.76 | 1388.93 | 5 (2025-03-26) | 2025-05-22 |
| 125. Qwen2.5 Max | 1374.38 | 1374.98 | 5 (2025-02-03) | 2025-03-25 |
| 127. Claude 3.5 Sonnet (2024-10-22) | 1371.79 | 1373.10 | 2 (2024-12-22) | 2025-03-26 |
| 128. Claude 3.7 Sonnet (2025-02-19) | 1370.74 | 1375.92 | 3 (2025-02-27) | 2025-04-16 |
| 134. o3 Mini High | 1363.44 | 1365.82 | 4 (2025-02-21) | 2025-04-16 |
| 137. Gemini 2.0 Flash 001 | 1360.16 | 1366.27 | 4 (2025-02-06) | 2025-03-17 |
| 138. DeepSeek V3 | 1358.40 | 1358.99 | 4 (2025-01-22) | 2025-02-27 |
| 145. Gemini 2.0 Flash Lite Preview (02-05) | 1353.09 | 1353.65 | 10 (2025-02-17) | 2025-02-17 |
| 146. Gemini 1.5 Pro 002 | 1350.91 | 1351.71 | 2 (2024-10-23) | 2025-02-27 |
| 150. o3 Mini | 1347.62 | 1348.61 | 8 (2025-02-14) | 2025-02-21 |
| 158. GPT-4o (2024-05-13) | 1345.41 | 1346.08 | 1 (2024-09-15) | 2025-02-06 |
| 161. Claude 3.5 Sonnet (2024-06-20) | 1341.69 | 1343.37 | 2 (2024-09-15) | 2025-02-05 |
| 165. o1 Mini | 1336.86 | 1337.32 | 2 (2024-09-27) | 2025-02-03 |
| 168. Grok 2 (2024-08-13) | 1335.11 | 1335.58 | 5 (2024-08-28) | 2024-12-05 |
| 169. GPT-4o (2024-08-06) | 1334.65 | 1335.39 | 7 (2024-09-15) | 2024-12-22 |
| 170. Gemini Advanced 0514 | 1334.59 | 1335.24 | 3 (2024-09-15) | 2025-01-28 |
| 171. Llama 3.1 405B Instruct Bf16 | 1334.54 | 1335.95 | 4 (2024-09-15) | 2025-01-24 |
| 173. Llama 3.1 405B Instruct Fp8 | 1332.73 | 1334.26 | 5 (2024-09-15) | 2025-01-05 |
| 181. GPT-4 Turbo (2024-04-09) | 1323.73 | 1324.88 | 8 (2024-09-15) | 2024-09-27 |
| 187. Claude 3 Opus (2024-02-29) | 1321.03 | 1323.08 | 9 (2024-09-15) | 2024-09-15 |
| – ChatGPT-4o Latest | – | 1288.84 | 1 (2024-09-04) | 2024-09-04 |
| – Dola Seed 2.0 Preview | – | 1474.55 | 7 (2026-02-24) | 2026-03-04 |
| – Gemini 1.5 Pro API 0514 | – | 1238.75 | 10 (2024-09-15) | 2024-09-15 |
| – Llama 3.1 405B Instruct | – | 1250.04 | 5 (2024-09-04) | 2024-09-04 |
Some interesting observations:
- Anthropic leads the current top 3. Claude Opus 4.7 Thinking holds the #1 spot with a score of 1504.53 — the first model ever to surpass 1500 on Arena AI. Its non-thinking variant (Claude Opus 4.7) sits at #3, while Claude Opus 4.6 Thinking occupies #2.
- Google and xAI are close behind. Gemini 3 Pro, Gemini 3.1 Pro Preview, and Gemini 3 Flash all appear in the current top 15. Meanwhile, xAI’s Grok 4.20 Beta 1 (#5) and multiple Grok variants keep the competition fierce.
- OpenAI’s flagship trajectory is dramatic. GPT-5 High once held the #1 position (August 2025 with a score of 1481) but has since dropped to #49 as newer models surged ahead. GPT-5.4 High, the latest OpenAI release, currently sits at #6.
- The “fallen champions” tell a powerful story. GPT-4o held #1 as recently as September 2024 and now ranks #158. o1 Preview held #1 in December 2024 and is now at #110.
- Chinese AI labs are increasingly competitive. GLM-5.1, Qwen3.5 Max Preview, ERNIE 5.0, and Kimi K2.5 Thinking all feature in the top 20, demonstrating that Zhipu AI, Alibaba, Baidu, and Moonshot are firmly in the race.
- Open-source models have reached the top 10. DeepSeek R1 peaked at #2 (February 2025) and DeepSeek V3.2 Exp Thinking currently sits at #10 — remarkable achievements for openly licensed models.
- The score ceiling keeps rising. In August 2024, the top score stood at around 1290. By April 2026, the leader exceeds 1504 — a ~17% improvement in under two years, reflecting rapid advances in model architecture, training data, and alignment techniques.
- Model lifespan in the top 10 is shrinking. Early models like GPT-4o and Gemini 1.5 Pro remained in the top 10 for months. Recent models are sometimes displaced within weeks, as the release cadence has dramatically accelerated.
Methodology
This analysis is based on Arena AI’s “overall” ratings for text-to-text tasks. Rankings are based on the score derived from pairwise human evaluations. Snapshots between August 28, 2024 and April 17, 2026 were analyzed. A model qualifies for the table above (see the “Models That Reached the Top 10” section) if it appeared at least once in the top 10 at any point during this period. “Current” values reflect the most recent available snapshot (April 17, 2026).
Data source: Arena AI (Text Arena).