Will an autonomous agent resolve 90% of tasks on SWE-bench by 2025?

14

220Ṁ2546

resolved Jan 1

Resolved

NO

1H

6H

1D

1W

1M

ALL

Resolves "Yes" if, at time of closure, there is an entry on the SWE-bench leaderboard (https://www.swebench.com/) with score greater or equal to 90%.

Linked Questions:

Technical AI Timelines

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ54
2		Ṁ32
3		Ṁ21
4		Ṁ16
5		Ṁ9

Sort by:

What if there's evidence that the training data is contaminated with the SWE-Bench tasks somehow?

@DavidFWatson That's an excellent question. Let's explore possibilities:

This could be included in the question, i.e. what matters is only the number on the benchmark, regardless of whether it was gamed
I could wait a certain amount of time to check if no controversy emerges. Feels like one month would be safe. The question then resolves yes if one month after the deadline, I judge that there is no consensus that the number was gamed. This makes the question more informative.

People are also trading

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

-8% 1d45% chance

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2026?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2027?

Will OpenAI models achieve ≥90% on SimpleBench by the end of 2025?

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?

AI resolves at least X% on SWE-bench without any assistance, by 2028?

AI resolves at least X% on SWE-bench WITH assistance, by 2028?

Will an AI system capable of doing tasks that take humans eight hours as determined by METR.org, exist by 2027

-4% 1d81% chance

Related questions

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2026?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2027?

Will OpenAI models achieve ≥90% on SimpleBench by the end of 2025?

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?

AI resolves at least X% on SWE-bench without any assistance, by 2028?

AI resolves at least X% on SWE-bench WITH assistance, by 2028?

Will an AI system capable of doing tasks that take humans eight hours as determined by METR.org, exist by 2027

© Manifold Markets, Inc.•Terms•Privacy