Will an AI SWE model score higher than 50% on SWE-bench in 2024?
Plus
16
Ṁ470Dec 31
20%
chance
1D
1W
1M
ALL
This question is managed and resolved by Manifold.
Get
1,000
and3.00
Sort by:
Traders (I can't tell which mention to use) -- how do you feel about changing this to be SWE-bench Verified explicitly?
https://www.swebench.com/ -- explanation of the differences here:
SWE-bench Lite is a subset of SWE-bench that's been curated to make evaluation less costly and more accessible.
SWE-bench Verified is a human annotator filtered subset that has been deemed to have a ceiling of 100% resolution rate.
If traders by majority do not want this change, we'll leave it at SWE-bench Full (which does not have a 100% resolution ceiling). And to make it fairer, it should be a majority of people voting NO.
Related questions
Related questions
Will any model get above human level (92%) on the Simple Bench benchmark before September 1st, 2025.
36% chance
AI resolves at least X% on SWE-bench assistance, by 2025?
AI resolves at least X% on SWE-bench WITH assistance, by 2028?
80% on SWE-Bench Verified by Jan 1 2025
39% chance
What will be the best score on the SWE-Bench (unassisted) benchmark before 2025?
39% chance
Will an AI agent system be able to score at least 40% on level 3 tasks in the GAIA benchmark before 2025.
48% chance
Will an AI be capable of achieving a perfect score on the Putnam exam before 2028?
56% chance
Will OpenAI models achieve ≥90% on SimpleBench by the end of 2025?
38% chance
Will an AI get a perfect SAT score before 2025?
14% chance
What year will the first AI exceed 80% on MLE-bench?