Best 8-hour AI score on RE-Bench >= 0.8 by what year?
2
125Ṁ92
2030
60%
2026
66%
2027
74%
2028
76%
2029
83%
2030

Each option is the upper bound of the date range. All true options resolve YES (i.e. if it happens in 2027, then 2028 and 2029 resolve YES).

RE-bench (paper, github) is a benchmark of ML research engineering tasks. By what year will any AI achieve an average normalized score >= 0.8 within an 8 hour window? Note that the paper contains several variants of the benchmark, this question is specifically about the metric in Figure 5 of the paper. Note that 0.8 looks to be about peak human performance (within their evaluator set, at least).

A YES resolution requires this specific metric, not any of the others in the paper. Meaning that even if it is blindingly obvious that an AI can do this, I will not resolve YES until it actually does. Since RE-bench is open source I will run it myself in that scenario (assuming I can get access to the model).

If there are minor updates to RE-bench (e.g. a version 2 that includes a few additional questions) then I will accept results on the updated version OR the original version.

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy