MANIFOLD
New METR SOTA by end of February, 2026?
4
Ṁ100Ṁ125
Feb 28
25%
chance

Resolves YES if any model surpasses a 50% time-horizon of 4h 49m on https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ by the end of February

The model must be scored by end of month (not just released by end of month).

  • Update 2026-01-22 (PST) (AI summary of creator comment): The market resolves YES if the official METR long tasks SOTA goes up (beyond 4h 49m), even if it's from a model which has already been tested. If METR releases a new test suite and uses it to update the values on the existing METR long tasks graph, and as a result one of the models gets a time-horizon greater than 4h 49m, this would resolve YES.

Market context
Get
Ṁ1,000
to start trading!
Sort by:
bought Ṁ50 NO

doesnt count if they announce their new test suite where they reevaluated opus 4.5 and it got like 6 hous right

@Bayesian that would possibly meet the res criteria and could count, something similar did in:

/jim/new-metr-sota-by-eoy

but it woudl depend on the specifics

I don't understand how that case is analogous to this one

@Bayesian ok fair i just woke up and misread your message to begin with, anyway if it's a different suite it just depends on how it's presented.

Pretty much, this resolves YES if the official METR long tasks SOTA goes up, even if it's from a model which has already been tested. So if the new suite was used to update the values on the existing METR long tasks graph and as a result one of the models got a bigger than 4h 49m horizon this would resolve as 'yes'

© Manifold Markets, Inc.TermsPrivacy