Will there be a score of 80% or higher on Humanity's Last Exam before April 1, 2025?
14
1kṀ4992
Apr 2
4%
chance

Background

Humanity's Last Exam (HLE) is a benchmark designed to test AI models at the frontiers of human expertise. The exam consists of expert-level questions across various fields, deliberately crafted to be extremely challenging. Current AI models have performed poorly on this benchmark, with leading models answering fewer than 10% of expert questions correctly.

Resolution Criteria

This market will resolve YES if any AI model achieves a verified score of 80% or higher on Humanity's Last Exam before April 1, 2025. The score must be:

  • Independently verified by Scale AI or another reputable organization

  • Achieved on the full exam, not a subset

  • Publicly announced and documented

  • Achieved through a single model's capabilities (not through combining multiple models or human assistance)

The market will resolve NO if no AI model achieves a verified score of 80% or higher by April 1, 2025.

Considerations

  • The current performance gap between AI models (<10%) and the target (80%) is substantial

  • Experts predict models might exceed 50% accuracy by the end of 2025, making an 80% score by April 2025 particularly ambitious

  • The exam is specifically designed to test the limits of AI capabilities, making rapid improvements more challenging than on typical benchmarks

  • Scale AI's methodology and scoring criteria may evolve, but resolution will be based on their official scoring system at the time of evaluation

Get
Ṁ1,000
to start trading!
Sort by:

What if the model had been trained on HLE in its training data? Would that disqualify it?

I don't know what I'm talking about here, so feel free to call me out on it.

@Quroe Defers to Scale's leaderboard. I trust they wont just allow this but if they do

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules