Will OpenAI's next-gen math-focused model score at least 95% on the MATH benchmark?
27
Ṁ1kṀ1.8kresolved Sep 16
Resolved
NO1H
6H
1D
1W
1M
ALL
Resolve to YES if OpenAI's next generation math-focused model achieves a score of 95% or higher on the MATH benchmark.
If the next generation of general models (e.g. GPT-4), code models (e.g. Codex), or any other models specialized for reasoning are released earlier than the math models and score 95% or higher, it will resolve this question to YES.
Benchmarking on a subset of MATH is acceptable.
Using tools(e.g. calculator) & code is allowed.
This question is managed and resolved by Manifold.
Market context
Get
1,000 to start trading!
🏅 Top traders
| # | Trader | Total profit |
|---|---|---|
| 1 | Ṁ119 | |
| 2 | Ṁ75 | |
| 3 | Ṁ51 | |
| 4 | Ṁ37 | |
| 5 | Ṁ14 |
Sort by:
Why is this resolving yes? I would have thought no? https://github.com/openai/simple-evals?tab=readme-ov-file#benchmark-results
People are also trading
Related questions
Open-Source AI model gets perfect IMO 2026 score? [International Math Olympiad 2026]
44% chance
Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?
35% chance
Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?
48% chance
Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?
63% chance
Will OpenAI ever top the LMArena leaderboard again before 2030?
86% chance
Will OpenAI's o4 get above 50% on humanity's last exam?
16% chance
Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?
67% chance
Which MATH-AI 23 works will have >50 Google Scholar citations by end of 2026?
Benchmark Gap #4: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, how many months will it be before an AI is listed as a (co) first author on a published math paper?
37