
Will there be an LLM which scores above what a human can do in 2 hours on METR's eval suite before 2026?
11
100Ṁ592resolved Nov 24
Resolved
YES1H
6H
1D
1W
1M
ALL
METR has found that current frontier models get a score on their autonomy benchmark roughly similar to a human who is given 30 minutes. Will at least one model score at the level of a human given 2 hours by 2026?

Clarifications:
I will try to resolve this market in accordance with the current task suite. If METR makes the suite harder or easier I will try to account for this in the resolution of this market.
if I am not able to determine the performance of frontier models at the end of 2025, this market will be resolved NA
This question is managed and resolved by Manifold.
Market context
Get
1,000 to start trading!
🏅 Top traders
| # | Name | Total profit |
|---|---|---|
| 1 | Ṁ63 | |
| 2 | Ṁ37 | |
| 3 | Ṁ6 | |
| 4 | Ṁ4 | |
| 5 | Ṁ4 |
People are also trading
Related questions
Will the highest-scoring LLM on Dec 31, 2026 show <10% improvement over 2025's best average benchmark performance?
72% chance
Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?
14% chance
Will an LLM be able to solve the Self-Referential Aptitude Test before 2027?
79% chance
Will there by a major breakthrough in LLM continual learning before 2027?
48% chance
Will there be any major breakthrough in LLM continual learning before 2028?
70% chance
Will there be any major breakthrough in LLM continual learning before 2029?
81% chance
Will there be any major breakthrough in LLM continual learning before 2030?
85% chance
Will there be any simple text-based task that most humans can solve, but top LLMs can't? By the end of 2026
59% chance
Will LLM based systems have debugging ability comparable to a human by 2030?
68% chance