Will Claude 4 achieve over 95% on the MMLU-Pro benchmark by end of 2025?
9
100Ṁ220
Dec 31
40%
chance

This market predicts whether Anthropic's next-generation Claude 4 model will achieve a score exceeding 95% on the MMLU-Pro benchmark before December 31, 2025. MMLU-Pro is an enhanced version of the Massive Multitask Language Understanding benchmark, which tests AI models on multiple-choice questions across various subjects. As of April 2025, Claude 3.7 Sonnet has achieved around 83% on MMLU-Pro, while the current record holder (OpenAI's o1) scores just over 90% on standard MMLU. A score above 95% would represent a significant breakthrough in AI capabilities, potentially surpassing average human expert performance on these tests.

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules