Best 8-hour AI score on RE-Bench >= 0.8 by what year?
1
125Ṁ42
2030
41%
2026
50%
2027
62%
2028
66%
2029
76%
2030

Each option is the upper bound of the date range. All true options resolve YES (i.e. if it happens in 2027, then 2028 and 2029 resolve YES).

RE-bench (paper, github) is a benchmark of ML research engineering tasks. By what year will any AI achieve an average normalized score >= 0.8 within an 8 hour window? Note that the paper contains several variants of the benchmark, this question is specifically about the metric in Figure 5 of the paper. Note that 0.8 looks to be about peak human performance (within their evaluator set, at least).

A YES resolution requires this specific metric, not any of the others in the paper. Meaning that even if it is blindingly obvious that an AI can do this, I will not resolve YES until it actually does. Since RE-bench is open source I will run it myself in that scenario (assuming I can get access to the model).

If there are minor updates to RE-bench (e.g. a version 2 that includes a few additional questions) then I will accept results on the updated version OR the original version.

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy