METR's evaluation measures AI performance by the duration of tasks that models can complete with a 50% success rate. This market predicts Claude Opus 4's time horizon, as reported by METR.
If no score is provided by the end of July 2025, this market resolves as N/A. If there are multiple scores provided by METR, I'll use my best judgment. I won't trade in this market.
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ1,148 | |
2 | Ṁ297 | |
3 | Ṁ55 | |
4 | Ṁ29 | |
5 | Ṁ9 |
People are also trading
80 minutes

Thomas Kwa works at METR. Link to post:
https://www.lesswrong.com/posts/RnKmRusmFpw7MhPYw/cole-wyeth-s-shortform?commentId=cZWcjhHMvEwCwDWHv
@JoshYou Huh. I had thought that METR's resuls already allowed for best-of-N, which would be equivalent/redudant with parallel processing, but apparently I was wrong. I redact my earlier comment and try to do an apples-to-apples comparison.