What will be o3's score on Humanity's Last Exam?
29
1kṀ16k
resolved Apr 17
100%96%
20% - 24%
0.4%
Less than 12%
0.5%
12% - 16%
1.6%
16% - 20%
1.7%
At least 24%

OpenAI has announced a model named o3. What will be the score of this model on Humanity's Last Exam (https://agi.safe.ai/)?

Resolution is based on the score given for o3 on https://agi.safe.ai/. If there are multiple scores (e.g. for "high" and "medium" reasoning), resolution is based on the highest score. If there is no score on https://agi.safe.ai/ within a month from the release of the model, I will use my best judgment.

I will trade on this market.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ804
2Ṁ575
3Ṁ217
4Ṁ186
5Ṁ178
Sort by:
bought Ṁ250 YES

@Loppukilpailija resolves to 20-24%, see https://agi.safe.ai/

bought Ṁ500 NO

I guess you might want to wait to see if the higher score openai got for when tools are used is included on that site?

@Fay42 Thanks, this is sufficient.

sold Ṁ217 YES

@Frankas unclear this will be the canonical o3 score on HLE (e.g. is the tool use fair game? is there any pass@k thing happening under the hood?)

@JoshYou
> Resolution is based on the score given for o3 on https://agi.safe.ai/. If there are multiple scores (e.g. for "high" and "medium" reasoning), resolution is based on the highest score.
Seems to imply that so long as it's included on the site it'll count? Though idk if it still counts if it's included as OpenAI Deep Research rather than O3 Deep Research or something.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules