Will AI outperform the best human expert on machine learning R&D?
Will AI outperform the best human expert on machine learning R&D?
16
1kṀ2393
2028
85%
Before 2029
76%
Before 2028
65%
Before 2027
45%
Before 2026

Will AI outperform the best human on the RE-bench benchmark?

https://arxiv.org/abs/2411.15114

Resolution criteria

This resolution will use AI Digest as its source. Score must be > 1.27

Which AI systems count?

Any AI system counts if it operates within realistic deployment constraints and doesn't have unfair advantages over human baseliners.

Tool assistance, scaffolding, and any other inference-time elicitation techniques are permitted as long as:

  • There is no systematic unfair advantage over the humans described in the Human Performance section (e.g. AI systems are allowed to have multiple outputs autograded while humans aren't, or AI systems have access to the internet when humans don't).

  • Having the AI system complete the task does not use more compute than could be purchased with the wages needed to pay a human to complete the same task to the same level


The PASS@k elicitation technique (which automatically grades and chooses the best out of k outputs from a model) is a common example that we do accept on this benchmark because human baseliners in RE-Bench also have access to scoring metrics (e.g. loss/runtime). So PASS@k doesn't constitute a clear unfair advantage.

If there is evidence of training contamination leading to substantially increased performance, scores will be accordingly adjusted or disqualified.

Related

Get
Ṁ1,000
to start trading!


Sort by:
bought Ṁ10 Before 2026 YES19d

I am really confused why it was so low? The "related" market is at 57% that the score will be >= 1.4 by the end of this year. And it has much more liquidity and exactly the same criteria and source?

opened a Ṁ10 Before 2029 YES at 50% order1mo

For clarity, multiple answers can resolve YES?

1mo

@jim if it's before 2027, it's also before 2028, so yes

What is this?

What is Manifold?
Manifold is the world's largest social prediction market.
Get accurate real-time odds on politics, tech, sports, and more.
Or create your own play-money betting market on any question you care about.
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like betting still use Manifold to get reliable news.
ṀWhy use play money?
Mana (Ṁ) is the play-money currency used to bet on Manifold. It cannot be converted to cash. All users start with Ṁ1,000 for free.
Play money means it's much easier for anyone anywhere in the world to get started and try out forecasting without any risk. It also means there's more freedom to create and bet on any type of question.
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules