What will be the METR time horizon doubling time in 2026?
4
10kṀ3420
2027
6%
<3 months
8%
3-3.5 months
13%
3.5-4 months
15%
4-4.5 months
16%
4.5-5 months
11%
5-5.5 months
9%
5.5-6 months
7%
6-6.5 months
4%
6.5-7 months
9%
>7 months

This market matches Software Engineering: METR Time Horizon Doubling Time from the AI 2026 Forecasting Survey by AI Digest.

Resolution criteria

Resolves to the best-fit doubling time for METR-HRS frontier models as of December 31, 2026, computed using the methodology described in the More Info section of the survey (see here).

If METR releases an updated METR-HRS suite that is a clear successor with comparable difficulty for questions at the same horizon length, this question will be resolved based on the updated task suite.

Which AI systems count?

Any AI system counts if it operates within realistic deployment constraints and doesn't have unfair advantages over human baseliners.

Tool assistance, scaffolding, and any other inference-time elicitation techniques are permitted as long as:

  • No unfair and systematic advantage. There is no systematic unfair advantage over the humans described in the Human Performance section (e.g. AI systems are allowed to have multiple outputs autograded while humans aren't, or AI systems have access to the internet when humans don't).

  • Human cost parity. Having the AI system complete the task does not use more compute than could be purchased with the wages needed to pay a human to complete the same task to the same level. Any additional costs incurred by the AIs or humans (such as GPU rental costs) are included in the parity estimation.

The PASS@k elicitation technique (which automatically grades and chooses the best out of k outputs from a model) is a common example that we do not accept on this benchmark because human software engineers do not have access to automatic grading of their solutions, so PASS@k would constitute an unfair advantage.

If there is evidence of training contamination leading to substantially increased performance, scores will be accordingly adjusted or disqualified.

If a model is released in 2026 but evaluated after year-end, the resolver may include it at their discretion (if they think that there was not an unfair advantage from being evaluated later, for example the scaffolding used should have been available within 2026).

Eli Lifland is responsible for final judgment on resolution decisions.

Human cost estimation process:

  1. Rank questions by human cost. For each question, estimate how much it would cost for humans to solve it. If humans fail on a question, factor in the additional cost required for them to succeed.

  2. Match the AI's accuracy to a human cost total. If the AI system solves N% of questions, identify the cheapest N% of questions (by human cost) and sum those costs to determine the baseline human total.

  3. Account for unsolved questions. For each question the AI does not solve, add the maximum cost from that bottom N%. This ensures both humans and AI systems are compared under a fixed per-problem budget, without relying on humans to dynamically adjust their approach based on difficulty.

Buckets are left-inclusive: e.g., 4-4.5 months includes 4.0 but not 4.5.

Market context
Get
Ṁ1,000
to start trading!
Sort by:
bought Ṁ25 YES

@Bayesian any idea how they measure this doubling time if its taking months? Like would the results only come out after 6 months or something?

© Manifold Markets, Inc.TermsPrivacy