Will Claude 4 achieve over 95% on the MMLU-Pro benchmark by end of 2025?
9
100Ṁ220Dec 31
40%
chance
1H
6H
1D
1W
1M
ALL
This market predicts whether Anthropic's next-generation Claude 4 model will achieve a score exceeding 95% on the MMLU-Pro benchmark before December 31, 2025. MMLU-Pro is an enhanced version of the Massive Multitask Language Understanding benchmark, which tests AI models on multiple-choice questions across various subjects. As of April 2025, Claude 3.7 Sonnet has achieved around 83% on MMLU-Pro, while the current record holder (OpenAI's o1) scores just over 90% on standard MMLU. A score above 95% would represent a significant breakthrough in AI capabilities, potentially surpassing average human expert performance on these tests.
This question is managed and resolved by Manifold.
Get
1,000 to start trading!
People are also trading
Related questions
Will Claude 3.5 Opus be available via API by end of 2025?
20% chance
Will an open-source LLM under 10B parameters surpass Claude 3.5 Haiku by EOY 2025?
90% chance
Will Claude 3.5 Opus beat OpenAI's best released model on the arena.lmsys.org leaderboard?
9% chance
Will Claude MCP have equivalent functionality to a Claude Computer Use module by EOY2025?
57% chance
Will Claude become a Pokèmon Master by the end of 2025?
26% chance
Will a text model achieve 100% performance on the MMLU in five years?
28% chance
MMLU 99% #3: Will SOTA for MMLU (average) pass 99% by the start of 2026?
6% chance
When will Claude 4 be released?
Will Al achieve 85% or higher on the Humanity's Last Exam benchmark before 2030?
77% chance
MMLU 99% #5: Will SOTA for MMLU (average) pass 99% by the start of 2028?
44% chance