Before 2028, will any AI model achieve the same or greater benchmarks as o3 high with <= 1 million tokens per question?
Plus
1
Ṁ502028
48%
chance
1D
1W
1M
ALL
Specifically, the key benchmarks here are ARC, Codeforces elo, and Frontier Math score. The relevant scores are 2727 codeforces elo, 87.5% on arc semi-private, and 25.2% on Frontier Math.
The model must achieve these benchmarks while using no more than 1,000,000 reasoning tokens per question on average.
For context, o3 used 5.7B tokens per task to achieve its ARC score. It also scored 75.7% on low compute mode using 33M tokens per task.
https://arcprize.org/blog/oai-o3-pub-breakthrough
Also note that if the final version of o3 has improved or worsened benchmarks the goalposts will not change. The model must beat the benchmarks listed here.
This question is managed and resolved by Manifold.
Get
1,000
and3.00
Related questions
Related questions
Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?
70% chance
Will a Chinese-made AI beat o3's December score on Frontier Math by the end of 2025?
53% chance
Will AI pass the Longbets version of the Turing test by the end of 2029?
54% chance
Will there be an AI language model that strongly surpasses ChatGPT and other OpenAI models before the end of 2024?
3% chance
Will there be another major public-facing breakthrough in AI before December 31, 2024 [subjective - 1000M boost added]
69% chance
Will openAI have the most accurate LLM across most benchmarks by EOY 2024?
37% chance
Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?
62% chance
Will any AI model score >80% on Epoch's Frontier Math Benchmark in 2025?
43% chance
Will OpenAI models achieve ≥90% on SimpleBench by the end of 2025?
46% chance
Will any model get above human level on the Simple Bench benchmark before September 1st, 2025.
55% chance