METR has found that current frontier models get a score on their autonomy benchmark roughly similar to a human who is given 30 minutes. Will at least one model score at the level of a human given 2 hours by 2026? [image]Clarifications: I will try to resolve this market in accordance with the current task suite. If METR makes the suite harder or easier I will try to account for this in the resolution of this market. if I am not able to determine the performance of frontier models at the end of 2025, this market will be resolved NA

Yes — resolved on Nov 24, 2025 by Manifold Markets prediction market.

Will there be an LLM which scores above what a human can do in 2 hours on METR's eval suite before 2026?

METR has found that current frontier models get a score on their autonomy benchmark roughly similar to a human who is given 30 minutes. Will at least one model score at the level of a human given 2 hours by 2026?

Clarifications:

I will try to resolve this market in accordance with the current task suite. If METR makes the suite harder or easier I will try to account for this in the resolution of this market.
if I am not able to determine the performance of frontier models at the end of 2025, this market will be resolved NA

#	Trader	Total profit
1		Ṁ63
2		Ṁ37
3		Ṁ6
4		Ṁ4
5		Ṁ4

Trader

Total profit

Ṁ63

Ṁ37

Ṁ6

Ṁ4

🏅 Top traders

People are also trading

Related questions