What will be o3's score on Humanity's Last Exam?

29

Ṁ1kṀ16k

resolved Apr 17

100%96%

20% - 24%

0.4%

Less than 12%

0.5%

12% - 16%

1.6%

16% - 20%

1.7%

At least 24%

OpenAI has announced a model named o3. What will be the score of this model on Humanity's Last Exam (https://agi.safe.ai/)?

Resolution is based on the score given for o3 on https://agi.safe.ai/. If there are multiple scores (e.g. for "high" and "medium" reasoning), resolution is based on the highest score. If there is no score on https://agi.safe.ai/ within a month from the release of the model, I will use my best judgment.

I will trade on this market.

Market context

Competition Math

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ804
2		Ṁ575
3		Ṁ217
4		Ṁ186
5		Ṁ178

People are also trading

Top score on Humanity's Last Exam > 50% by 2029?

Top score on Humanity's Last Exam > 90% by what year?

Top score on Humanity's Last Exam > 60% by what year?

Top score on Humanity's Last Exam > 80% by what year?

Will OpenAI's o4 get above 50% on humanity's last exam?

Top score on Humanity's Last Exam > 70% by what year?

Will Al achieve 85% or higher on the Humanity's Last Exam benchmark before 2027?

Will Al achieve 95% or higher on the Humanity's Last Exam benchmark before 2030?

In what year will Al achieve 95% or higher score on the Humanity’s Last Exam benchmark?

In what year will Al achieve 85% or higher score on the Humanity’s Last Exam benchmark?

Sort by:

bought Ṁ250 YES

@Loppukilpailija resolves to 20-24%, see https://agi.safe.ai/

bought Ṁ500 NO

I guess you might want to wait to see if the higher score openai got for when tools are used is included on that site?

@Fay42 Thanks, this is sufficient.

https://openai.com/index/introducing-deep-research/

sold Ṁ217 YES

@Frankas unclear this will be the canonical o3 score on HLE (e.g. is the tool use fair game? is there any pass@k thing happening under the hood?)

@JoshYou
> Resolution is based on the score given for o3 on https://agi.safe.ai/. If there are multiple scores (e.g. for "high" and "medium" reasoning), resolution is based on the highest score.
Seems to imply that so long as it's included on the site it'll count? Though idk if it still counts if it's included as OpenAI Deep Research rather than O3 Deep Research or something.

People are also trading

Top score on Humanity's Last Exam > 50% by 2029?

Top score on Humanity's Last Exam > 90% by what year?

Top score on Humanity's Last Exam > 60% by what year?

Top score on Humanity's Last Exam > 80% by what year?

Will OpenAI's o4 get above 50% on humanity's last exam?

Top score on Humanity's Last Exam > 70% by what year?

Will Al achieve 85% or higher on the Humanity's Last Exam benchmark before 2027?

Will Al achieve 95% or higher on the Humanity's Last Exam benchmark before 2030?

In what year will Al achieve 95% or higher score on the Humanity’s Last Exam benchmark?

In what year will Al achieve 85% or higher score on the Humanity’s Last Exam benchmark?

Related questions

Top score on Humanity's Last Exam > 50% by 2029?

Top score on Humanity's Last Exam > 90% by what year?

Top score on Humanity's Last Exam > 60% by what year?

Top score on Humanity's Last Exam > 80% by what year?

Will OpenAI's o4 get above 50% on humanity's last exam?

Top score on Humanity's Last Exam > 70% by what year?

Will Al achieve 85% or higher on the Humanity's Last Exam benchmark before 2027?

Will Al achieve 95% or higher on the Humanity's Last Exam benchmark before 2030?

In what year will Al achieve 95% or higher score on the Humanity’s Last Exam benchmark?

In what year will Al achieve 85% or higher score on the Humanity’s Last Exam benchmark?