Will an AI SWE model score higher than 50% on SWE-bench in 2024?
16
Ṁ1kṀ470resolved Jan 1
Resolved
NO1H
6H
1D
1W
1M
ALL
This question is managed and resolved by Manifold.
Market context
Get
1,000 to start trading!
🏅 Top traders
| # | Trader | Total profit |
|---|---|---|
| 1 | Ṁ41 | |
| 2 | Ṁ18 | |
| 3 | Ṁ16 | |
| 4 | Ṁ15 | |
| 5 | Ṁ8 |
Sort by:
Traders (I can't tell which mention to use) -- how do you feel about changing this to be SWE-bench Verified explicitly?
https://www.swebench.com/ -- explanation of the differences here:
SWE-bench Lite is a subset of SWE-bench that's been curated to make evaluation less costly and more accessible.
SWE-bench Verified is a human annotator filtered subset that has been deemed to have a ceiling of 100% resolution rate.
If traders by majority do not want this change, we'll leave it at SWE-bench Full (which does not have a 100% resolution ceiling). And to make it fairer, it should be a majority of people voting NO.
People are also trading
Related questions
What will be the highest score on the SWE-bench pro private set before 2027?
68.0
In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?
11/27/27
AI resolves at least X% on SWE-bench without any assistance, by 2028?
AI resolves at least X% on SWE-bench WITH assistance, by 2028?
Will an autonomous agent resolve 90% of tasks on SWE-bench by 2027?
70% chance
In what year will AI achieve a score of 85% or higher on the SimpleBench leaderboard?
2/22/32
Top SWE-Bench Pro score by Jan 1, 2027?
78.3
In what year will AI achieve a score of 95% or higher on the PhysBench leaderboard?
2036
Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?
35% chance
Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?
48% chance