Will there be an LLM which scores above what a human can do in 2 hours on METR's eval suite before 2026?
11
100Ṁ592
resolved Nov 24
Resolved
YES

METR has found that current frontier models get a score on their autonomy benchmark roughly similar to a human who is given 30 minutes. Will at least one model score at the level of a human given 2 hours by 2026?

Clarifications:

  1. I will try to resolve this market in accordance with the current task suite. If METR makes the suite harder or easier I will try to account for this in the resolution of this market.

  2. if I am not able to determine the performance of frontier models at the end of 2025, this market will be resolved NA

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ63
2Ṁ37
3Ṁ6
4Ṁ4
5Ṁ4
Sort by:
bought Ṁ150 YES

GPT-5 satisfies the criterion: https://evaluations.metr.org/gpt-5-report/

© Manifold Markets, Inc.TermsPrivacy