See https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
Resolves to the longest 50% Time Horizon, as measured by METR, for any AI system, by the end of 2026. Answers that are passed early can be resolved early.
IMPORTANT: Resolves to all thresholds exceeded, not just to the highest one that applies. eg for a 11 hour time horizon, ">10h" resolves yes, but so does >6h and >8h
People are also trading
@Bayesian '>14h' can resolve as 'yes' because of the Opus 4.6 measurement (14 hours and 30 minutes). Thanks!
The members of the AI futures project have given an update and they appear to now be relying on the 80% time horizon length graph from METR for their predictions rather than the 50% time horizon length graph (correction: they have always used the 80% time horizon length). This implies that a 50% time horizon is not enough. While I think markets for 50% time horizons are useful, I now think that more attention needs to be paid to 80% time horizon lengths.
@MaxLennartson You had said that markets for the 50% time horizon were useful. But with the release of the 50% time horizon for Claude 4.6 Opus, METR said that it's hard to measure now because the benchmark is saturated. Basically, we don't actually know what the time horizon of Claude 4.6 Opus is. They're continuing to work on updating the time horizon benchmark, but the new version might be saturated at 50% by the time it gets released. I expect them to retire the 50% benchmark and add a 95% benchmark.
I very roughly polled METR staff (using Fatebook) what the 50% time horizon will be by EOY 2026, conditional on METR reporting something analogous to today's time horizon metric.
I got the following results: 29% average probability that it will surpass 32 hours. 68% average probability that it will surpass 16 hours.
The first question got 10 respondents and the second question got 12. Around half of the respondents were technical researchers. I expect the sample to be close to representative, but maybe a bit more short-timelines than the rest of METR staff.
The average probability that the question doesn't resolve AMBIGUOUS is somewhere around 60%.
@Bayesian am i misinformed or outdated now? If the doubling period was 7 months or whatever these estimations seem quite optimistic on an increase in the doubling speed!
@Bayesian were they never 7 months? am i just misremembering? or did they seem to shift not too long ago?
@No_uh oh yeah they were 7 months, and for a while they were consistent eith 4 month to 7month range, and that uncertainty is slightly narrowing over time
@Bayesian Yes, that makes sense. it looks like I just am not up to date on the narrowing. I'm only human lmao, seems I already cannot keep up. Welp, enjoy my free mana everyone ;)
exit: and thank you as always Bayesian for responding!
