AI resolves at least X% on SWE-bench assistance, by 2025?

25

Ṁ1kṀ5.3k

resolved Jan 1

Resolved

YES

X=5

Resolved

YES

X=10

Resolved

YES

X=20

Resolved

YES

X=40

Resolved

NO

X=80

The SWE-bench is a benchmark developed to evaluate if language models can resolve real-world GitHub issues. The leaderboard showcases various models and their performances in terms of the percentage of SWE-bench instances they resolved. Each instance in the SWE-bench represents a GitHub issue. The leaderboard is categorized into two main sections: Unassisted and Assisted.

Assisted: In this category, models have the advantage of the "oracle" retrieval setting where the correct files to edit are directly given to them.

This question is only about the Assisted category of this benchmark.

http://www.swebench.com/#
Current SOTA is 4.8

The prediction market will resolve based on the SWE-bench leaderboard standings as of 31 December 2024.

Multiple answers can be correct.

Market context

Technical AI Timelines

Programming Automation

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ103
2		Ṁ79
3		Ṁ63
4		Ṁ61
5		Ṁ57

People are also trading

AI resolves at least X% on SWE-bench WITH assistance, by 2028?

AI resolves at least X% on SWE-bench without any assistance, by 2028?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2027?

In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

Will any AI achieve a score of 25% on ARC-AGI-3 by the end of 2026?

When will SWE-bench be solved?

What will be the highest score on the SWE-bench pro private set before 2027?

In what year will AI achieve a score of 85% or higher on the SimpleBench leaderboard?

Sort by:

bought Ṁ500 YES

@AntonOsika SOTA is now 55.0%

(didn't mean to repost)

@Bayesian Isn't that for verified? Does assisted even exist anymore on leaderboards?

Do all the of the options bellow the threshold resolve to true?

bought Ṁ50 YES

reposted

bump

People are also trading

AI resolves at least X% on SWE-bench WITH assistance, by 2028?

AI resolves at least X% on SWE-bench without any assistance, by 2028?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2027?

In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

Will any AI achieve a score of 25% on ARC-AGI-3 by the end of 2026?

When will SWE-bench be solved?

What will be the highest score on the SWE-bench pro private set before 2027?

In what year will AI achieve a score of 85% or higher on the SimpleBench leaderboard?

Related questions

AI resolves at least X% on SWE-bench WITH assistance, by 2028?

AI resolves at least X% on SWE-bench without any assistance, by 2028?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2027?

In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

Will any AI achieve a score of 25% on ARC-AGI-3 by the end of 2026?

When will SWE-bench be solved?

What will be the highest score on the SWE-bench pro private set before 2027?

In what year will AI achieve a score of 85% or higher on the SimpleBench leaderboard?