See https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
Resolves to the longest 50% Time Horizon as measured by METR, for any AI system, by the end of 2026. Answers that are passed early can be resolved early.
IMPORTANT: Resolves to all thresholds exceeded, not just to the highest one that applies. eg for a 11 hour time horizon, ">10h" resolves yes, but so does >6h and >8h
I very roughly polled METR staff (using Fatebook) what the 50% time horizon will be by EOY 2026, conditional on METR reporting something analogous to today's time horizon metric.
I got the following results: 29% average probability that it will surpass 32 hours. 68% average probability that it will surpass 16 hours.
The first question got 10 respondents and the second question got 12. Around half of the respondents were technical researchers. I expect the sample to be close to representative, but maybe a bit more short-timelines than the rest of METR staff.
The average probability that the question doesn't resolve AMBIGUOUS is somewhere around 60%.
