The AI landscape has been evolving at a breathtaking pace. Over the past two years – from August 2024 to April 2026 – a total of 347 LLMs (large language models) have been tracked across leaderboard snapshots on Arena AI (formerly known as LMArena and Chatbot Arena), one of the most popular community-driven evaluation platforms for AI models.

Arena AI employs a unique crowdsourced blind evaluation methodology: real users submit prompts and compare responses from two anonymous models side by side. Based on thousands of human votes, each model receives a score reflecting its relative quality in open-ended text-to-text tasks. A higher score means the model wins more head-to-head comparisons against its peers. As of April 2026, scores range from roughly 950 (early-generation models) to over 1500 (state-of-the-art flagships).

Dynamic Visualization: The Line Race

To bring this data to life, a line race animation has been created, showing how the top AI models jockey for position over time. As new models enter the arena and older ones fade, the dramatic shifts unfold in real time – from GPT-4o’s early dominance through the rise of Claude Opus 4.7 and Gemini 3 Pro at the top. The animation features achievement milestones (triggered when models cross key rating thresholds).

The full video with this dynamic visualization has been published on YouTube:

Models That Reached the Top 10

Out of 347 models evaluated, only 77 have ever cracked the top 10 in the overall ranking. The table below presents each of these models along with their current standing, peak performance, and when they were last seen among the very best.

Current Rank & Model Current Score Best Score Best Rank Achieved Last Seen in Top 10
1. Claude Opus 4.7 Thinking 1504.53 1504.53 1 (2026-04-17) 2026-04-17
2. Claude Opus 4.6 Thinking 1502.63 1506.96 1 (2026-04-14) 2026-04-17
3. Claude Opus 4.7 1498.47 1498.47 3 (2026-04-17) 2026-04-17
4. Claude Opus 4.6 1496.83 1505.14 1 (2026-03-11) 2026-04-17
5. Muse Spark 1495.88 1495.88 3 (2026-04-14) 2026-04-17
6. Gemini 3.1 Pro Preview 1492.25 1500.71 2 (2026-03-05) 2026-04-17
7. Gemini 3 Pro 1486.11 1502.16 1 (2026-01-29) 2026-04-17
8. Grok 4.20 Beta 1 1485.01 1496.02 3 (2026-03-11) 2026-04-17
9. GPT-5.4 High 1481.63 1485.70 6 (2026-04-07) 2026-04-17
10. Grok 4.20 Beta Reasoning (03-09) 1479.81 1483.48 7 (2026-04-07) 2026-04-17
11. GPT-5.2 Chat Latest (2026-02-10) 1477.12 1502.50 3 (2026-02-17) 2026-04-14
12. Grok 4.20 Multi-Agent Beta (03-09) 1475.62 1478.97 9 (2026-04-07) 2026-04-14
13. Gemini 3 Flash 1474.02 1479.66 2 (2025-12-30) 2026-04-07
14. Claude Opus 4.5 Thinking 32K (2025-11-01) 1473.03 1473.90 3 (2025-12-15) 2026-03-31
16. Grok 4.1 Thinking 1469.85 1484.41 2 (2026-01-29) 2026-03-11
17. Claude Opus 4.5 (2025-11-01) 1468.76 1469.20 3 (2025-11-26) 2026-03-06
21. Gemini 3 Flash (Thinking Minimal) 1462.73 1464.05 7 (2026-01-29) 2026-02-11
23. Grok 4.1 1460.51 1466.36 3 (2025-11-20) 2026-02-11
25. GLM-5 1456.03 1470.37 8 (2026-02-10) 2026-02-10
26. GPT-5.1 High 1454.71 1460.58 4 (2025-11-20) 2026-01-29
27. GPT-5.3 Chat Latest 1454.34 1468.22 10 (2026-03-11) 2026-03-11
28. Claude Sonnet 4.5 Thinking 32K (2025-09-29) 1451.92 1453.01 1 (2025-10-03) 2026-01-29
29. Claude Sonnet 4.5 (2025-09-29) 1451.66 1452.74 5 (2025-11-09) 2025-12-15
32. ERNIE 5.0 0110 1450.44 1453.72 9 (2026-01-29) 2026-01-29
33. ERNIE 5.0 Preview 1203 1449.49 1450.56 9 (2025-12-21) 2025-12-21
34. Claude Opus 4.1 Thinking 16K (2025-08-05) 1448.85 1451.45 2 (2025-11-06) 2026-01-09
35. Gemini 2.5 Pro 1448.66 1466.64 1 (2025-11-09) 2026-01-09
36. Claude Opus 4.1 (2025-08-05) 1446.83 1462.10 2 (2025-08-07) 2025-11-20
39. GPT-4.5 Preview (2025-02-27) 1444.45 1444.88 1 (2025-03-25) 2025-11-20
40. ChatGPT-4o Latest (2025-03-26) 1443.19 1443.66 1 (2025-04-16) 2025-11-09
45. GPT-5.1 1438.68 1440.92 9 (2025-11-16) 2025-11-17
47. Qwen3 Max Preview 1434.94 1435.12 8 (2025-09-30) 2025-11-09
49. GPT-5 High 1433.37 1481.37 1 (2025-08-18) 2025-11-09
52. o3 (2025-04-16) 1431.27 1454.32 1 (2025-06-18) 2025-11-09
55. GPT-5 Chat 1426.56 1429.60 8 (2025-09-08) 2025-10-01
60. Claude Opus 4 Thinking 16K (2025-05-14) 1423.85 1424.30 6 (2025-07-28) 2025-09-18
61. Qwen3 235B-A22B Instruct 2507 1423.50 1432.93 5 (2025-08-04) 2025-08-21
64. DeepSeek R1 0528 1421.98 1421.98 5 (2025-06-18) 2025-08-04
65. Grok 4 Fast Chat 1421.08 1424.78 10 (2025-09-30) 2025-09-30
70. Kimi K2 Preview (07-11) 1417.40 1421.29 6 (2025-07-25) 2025-08-28
77. GPT-4.1 (2025-04-14) 1413.36 1413.86 4 (2025-05-22) 2025-07-15
78. Claude Opus 4 (2025-05-14) 1412.22 1420.44 4 (2025-06-18) 2025-08-01
79. Grok 3 Preview (02-24) 1411.89 1413.32 2 (2025-03-25) 2025-07-28
80. GLM-4.5 1411.16 1418.42 10 (2025-08-04) 2025-08-04
81. Gemini 2.5 Flash 1411.05 1417.54 6 (2025-07-07) 2025-07-17
82. Grok 4 0709 1410.12 1436.78 5 (2025-07-28) 2025-09-08
89. Qwen3 235B-A22B No Thinking 1403.21 1403.21 10 (2025-07-07) 2025-07-07
93. o1 (2024-12-17) 1401.79 1402.44 1 (2025-02-27) 2025-07-01
98. DeepSeek R1 1397.80 1398.20 2 (2025-02-27) 2025-05-22
103. DeepSeek V3 0324 1395.24 1397.36 4 (2025-04-16) 2025-06-18
107. o4 Mini (2025-04-16) 1389.90 1400.07 6 (2025-05-11) 2025-06-18
109. Claude Sonnet 4 (2025-05-14) 1388.86 1395.42 7 (2025-06-11) 2025-06-11
110. o1 Preview 1387.97 1388.54 1 (2024-12-22) 2025-04-16
114. Claude 3.7 Sonnet Thinking 32K (2025-02-19) 1386.76 1388.93 5 (2025-03-26) 2025-05-22
125. Qwen2.5 Max 1374.38 1374.98 5 (2025-02-03) 2025-03-25
127. Claude 3.5 Sonnet (2024-10-22) 1371.79 1373.10 2 (2024-12-22) 2025-03-26
128. Claude 3.7 Sonnet (2025-02-19) 1370.74 1375.92 3 (2025-02-27) 2025-04-16
134. o3 Mini High 1363.44 1365.82 4 (2025-02-21) 2025-04-16
137. Gemini 2.0 Flash 001 1360.16 1366.27 4 (2025-02-06) 2025-03-17
138. DeepSeek V3 1358.40 1358.99 4 (2025-01-22) 2025-02-27
145. Gemini 2.0 Flash Lite Preview (02-05) 1353.09 1353.65 10 (2025-02-17) 2025-02-17
146. Gemini 1.5 Pro 002 1350.91 1351.71 2 (2024-10-23) 2025-02-27
150. o3 Mini 1347.62 1348.61 8 (2025-02-14) 2025-02-21
158. GPT-4o (2024-05-13) 1345.41 1346.08 1 (2024-09-15) 2025-02-06
161. Claude 3.5 Sonnet (2024-06-20) 1341.69 1343.37 2 (2024-09-15) 2025-02-05
165. o1 Mini 1336.86 1337.32 2 (2024-09-27) 2025-02-03
168. Grok 2 (2024-08-13) 1335.11 1335.58 5 (2024-08-28) 2024-12-05
169. GPT-4o (2024-08-06) 1334.65 1335.39 7 (2024-09-15) 2024-12-22
170. Gemini Advanced 0514 1334.59 1335.24 3 (2024-09-15) 2025-01-28
171. Llama 3.1 405B Instruct Bf16 1334.54 1335.95 4 (2024-09-15) 2025-01-24
173. Llama 3.1 405B Instruct Fp8 1332.73 1334.26 5 (2024-09-15) 2025-01-05
181. GPT-4 Turbo (2024-04-09) 1323.73 1324.88 8 (2024-09-15) 2024-09-27
187. Claude 3 Opus (2024-02-29) 1321.03 1323.08 9 (2024-09-15) 2024-09-15
– ChatGPT-4o Latest 1288.84 1 (2024-09-04) 2024-09-04
– Dola Seed 2.0 Preview 1474.55 7 (2026-02-24) 2026-03-04
– Gemini 1.5 Pro API 0514 1238.75 10 (2024-09-15) 2024-09-15
– Llama 3.1 405B Instruct 1250.04 5 (2024-09-04) 2024-09-04

Some interesting observations:

  • Anthropic leads the current top 3. Claude Opus 4.7 Thinking holds the #1 spot with a score of 1504.53 — the first model ever to surpass 1500 on Arena AI. Its non-thinking variant (Claude Opus 4.7) sits at #3, while Claude Opus 4.6 Thinking occupies #2.
  • Google and xAI are close behind. Gemini 3 Pro, Gemini 3.1 Pro Preview, and Gemini 3 Flash all appear in the current top 15. Meanwhile, xAI’s Grok 4.20 Beta 1 (#5) and multiple Grok variants keep the competition fierce.
  • OpenAI’s flagship trajectory is dramatic. GPT-5 High once held the #1 position (August 2025 with a score of 1481) but has since dropped to #49 as newer models surged ahead. GPT-5.4 High, the latest OpenAI release, currently sits at #6.
  • The “fallen champions” tell a powerful story. GPT-4o held #1 as recently as September 2024 and now ranks #158. o1 Preview held #1 in December 2024 and is now at #110.
  • Chinese AI labs are increasingly competitive. GLM-5.1, Qwen3.5 Max Preview, ERNIE 5.0, and Kimi K2.5 Thinking all feature in the top 20, demonstrating that Zhipu AI, Alibaba, Baidu, and Moonshot are firmly in the race.
  • Open-source models have reached the top 10. DeepSeek R1 peaked at #2 (February 2025) and DeepSeek V3.2 Exp Thinking currently sits at #10 — remarkable achievements for openly licensed models.
  • The score ceiling keeps rising. In August 2024, the top score stood at around 1290. By April 2026, the leader exceeds 1504 — a ~17% improvement in under two years, reflecting rapid advances in model architecture, training data, and alignment techniques.
  • Model lifespan in the top 10 is shrinking. Early models like GPT-4o and Gemini 1.5 Pro remained in the top 10 for months. Recent models are sometimes displaced within weeks, as the release cadence has dramatically accelerated.

Methodology

This analysis is based on Arena AI’s “overall” ratings for text-to-text tasks. Rankings are based on the score derived from pairwise human evaluations. Snapshots between August 28, 2024 and April 17, 2026 were analyzed. A model qualifies for the table above (see the “Models That Reached the Top 10” section) if it appeared at least once in the top 10 at any point during this period. “Current” values reflect the most recent available snapshot (April 17, 2026).

Data source: Arena AI (Text Arena).