Best AI time horizon by February 2026, per METR?

2kṀ40k

2026

0.6%

<2 hours

52%

2 to 4 hours

36%

4 to 6 hours

6 to 8 hours

8 to 16 hours

>=16 hours

This market will resolve to the highest 50% time horizon, as reported by METR as of April 30, 2026, for any AI model released by February 28, 2026.

50% time horizon is a measure of AI autonomy based on the length of tasks that AI can do: roughly, it is the time that humans take to complete tasks that an AI system can successfully do 50% of the time. See METR's "Measuring AI Ability to Complete Long Tasks" for the technical definition. Claude 3.7 Sonnet, released in February 2025, was the leading model with a 50% horizon of 59 minutes.

Left bounds inclusive, right bounds exclusive.

Time horizon could vary based on the set of tasks used to measure it, so this market will be based on the time horizon for the most comprehensive set of tasks reported by METR (as of 2025, largely software and engineering tasks). This will be ambiguous if METR stops publishing time horizons across all of their autonomy tasks and only publishes separate results for different subsets; I might N/A in that scenario.

Technical AI Timelines

OpenAI

AI Benchmarks

METR

Get

1,000

to start trading!

People are also trading

By 2026 will there be autonomous AI good enough that I use it?

65% chance

Will there be serious AI safety drama at Meta AI before 2026?

6% chance

Will Tesla have more fully autonomous rides in 2025 than Waymo?

3% chance

Will there be an AI Winter by the end of 2025?

3% chance

Will pre-2026 AI out-forecast the Metaculus community?

54% chance

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2026?

8% chance

Will an AI system capable of doing tasks that take humans eight hours as determined by METR.org, exist by 2027

83% chance

By what percentage will using AI slowdown/speedup developers in the second METR study?

AI system achieves full technological and economic autonomy before 2028?

33% chance

Which "AI 2027" predictions will be right by Late 2026?

14 Comments

53 Holders

339 Trades

Sort by:

Has METR said anything about how long their tasks even go up to? Or are they just arbitrarily adding tasks as the models improve

@HenryE when they measure time horizon via "pass if any agent solved the task across all their runs" they got 16 hours. This was after removing a few problematic tasks so it's higher than the "official" estimate would be, unless they formally relaunch their task suite with those tasks removed.

bought Ṁ50 NO

@JoshYou <2 hours can already be resolved to no

@LoweLundin can't resolve one option early on a single-resolution market like this

@JoshYou Ah, sorry!

opened a Ṁ10,000 YES at 30% order

jim order up for the ">=16 hours" option

@jacksonpolack , @Bayesian , @SemioticRivalry , @Velaris , @khang2009 , @Gen , @skibidist , @evan , @brod , @100Anonymous , @Ziddletwix ,, @Trazyn , @bagelfan , @geuber , @nikki , @ProjectVictory , @sahaj , @bohaska , @Odoacre

jim orders are large limit orders, generally at better than market prices.

Opt in / opt out thread: https://manifold.markets/post/jim-order-notification-optin-thread

opened a Ṁ25,000 NO at 29% order

boughtṀ25 YES

@JoshYou 6-8 hours is the obvious bet to make based on current trends. I have a couple contrarian jim orders up for longer horizons

If a model is released by February, but is not evaluated, will this market resolve based on its eventual evaluation? Or does this market only deal with evaluations carried out prior to February?

@jim it resolves to the highest score as of April 30, 2026 of any model released by February

opened a Ṁ500 NO at 9% order

My guess is that the time horizon will be 24 hours.

sold Ṁ4 YES

I'm not sure exactly what value I would use for o3 for the purpose of this market, but it appears to be around 100 minutes. METR also says o3 has an 1.8x longer time horizon than Claude 3.7 on HCAST, which makes up the majority of their task suite.