Will a Chinese AI developer announce a model rivaling o3 performance by February 2025?
22
100Ṁ2928
resolved Feb 2
Resolved
NO

Market resolves yes if a major Chinese AI developer (e.g., Tencent, DeepSeek, Baidu, 01, Alibaba, ByteDance, others that seem unlikely to totally fraud) announces evaluation results for a model which tie or surpass OpenAI's o3 December 20th results on any one of the following:

SWE-Bench Verified: 71.7%

Codeforces: 2727 Elo

AIME 2024: 96.7%

GPQA Diamond: 87.7%

Frontier Math: 25.2%

ARC-AGI Semi-Private: 87.5%

Aggressive test time scaling is allowed. Pass@1, as this appears to be what OpenAI did (but I'm not totally sure this makes the most sense, or what to do if this is ambiguous). Benchmark contamination is a concern, but this market will resolve based on stated performance, whether or not benchmark contamination is suspected.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ247
2Ṁ100
3Ṁ26
4Ṁ24
5Ṁ15
© Manifold Markets, Inc.TermsPrivacy