Will an AI SWE model score higher than 50% on SWE-bench in 2024? | Manifold

Will an AI SWE model score higher than 50% on SWE-bench in 2024?

Plus

16

Ṁ470

Dec 31

20%

chance

1D

1W

1M

ALL

This question is managed and resolved by Manifold.

#Technical AI Timelines

#Artificial Intelligence

Get

1,000

and

3.00

Sort by:

Traders (I can't tell which mention to use) -- how do you feel about changing this to be SWE-bench Verified explicitly?

https://www.swebench.com/ -- explanation of the differences here:

SWE-bench Lite is a subset of SWE-bench that's been curated to make evaluation less costly and more accessible.
SWE-bench Verified is a human annotator filtered subset that has been deemed to have a ceiling of 100% resolution rate.

If traders by majority do not want this change, we'll leave it at SWE-bench Full (which does not have a 100% resolution ceiling). And to make it fairer, it should be a majority of people voting NO.

https://x.com/alistairpullen/status/1822981361608888619

30% on SWE-Bench based on this tweet.

Related questions

Will an AI achieve >30% performance on the FrontierMath benchmark before 2026?

-39% 1d28% chance

Will an AI score over 10% on FrontierMath Benchmark in 2025

+4% 1d79% chance

Will an AI be capable of achieving a perfect score on the Putnam exam before 2030?

-5% 1d68% chance

AI resolves at least X% on SWE-bench WITH assistance, by 2028?

What will be the best score on the SWE-Bench (unassisted) benchmark before 2025?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

-7% 1d32% chance

AI resolves at least X% on SWE-bench assistance, by 2025?

Will an AI score over 30% on FrontierMath Benchmark in 2025

-8% 1d28% chance

80% on SWE-Bench Verified by Jan 1 2025

Related questions

Will an AI achieve >30% performance on the FrontierMath benchmark before 2026?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

Will an AI score over 10% on FrontierMath Benchmark in 2025

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

Will an AI be capable of achieving a perfect score on the Putnam exam before 2030?

AI resolves at least X% on SWE-bench assistance, by 2025?

AI resolves at least X% on SWE-bench WITH assistance, by 2028?

Will an AI score over 30% on FrontierMath Benchmark in 2025

What will be the best score on the SWE-Bench (unassisted) benchmark before 2025?

80% on SWE-Bench Verified by Jan 1 2025

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules