AI resolves at least X% on SWE-bench assistance, by 2025?

25

1kṀ5341

resolved Jan 1

Resolved

YES

X=5

Resolved

YES

X=10

Resolved

YES

X=20

Resolved

YES

X=40

Resolved

NO

X=80

The SWE-bench is a benchmark developed to evaluate if language models can resolve real-world GitHub issues. The leaderboard showcases various models and their performances in terms of the percentage of SWE-bench instances they resolved. Each instance in the SWE-bench represents a GitHub issue. The leaderboard is categorized into two main sections: Unassisted and Assisted.

Assisted: In this category, models have the advantage of the "oracle" retrieval setting where the correct files to edit are directly given to them.

This question is only about the Assisted category of this benchmark.

http://www.swebench.com/#
Current SOTA is 4.8

The prediction market will resolve based on the SWE-bench leaderboard standings as of 31 December 2024.

Multiple answers can be correct.

Technical AI Timelines

Programming Automation

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ103
2		Ṁ79
3		Ṁ63
4		Ṁ61
5		Ṁ57

People are also trading

By what factor will the cost for SotA SWE-agents drop from 2024 to 2025?

AI resolves at least X% on SWE-bench WITH assistance, by 2028?

AI resolves at least X% on SWE-bench without any assistance, by 2028?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2026?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2027?

What will be the highest score achieved on SWE-Bench Verified in 2025?

What will be the best performance on SWE-bench Verified by December 31st 2025?

Will any model get above human level on the Simple Bench benchmark before September 1st, 2025.

Top Multi-SWE-bench score in 2025?

Top SWE-Bench Verified score in 2025?

Related questions

By what factor will the cost for SotA SWE-agents drop from 2024 to 2025?

AI resolves at least X% on SWE-bench WITH assistance, by 2028?

AI resolves at least X% on SWE-bench without any assistance, by 2028?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2026?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2027?

What will be the highest score achieved on SWE-Bench Verified in 2025?

What will be the best performance on SWE-bench Verified by December 31st 2025?

Will any model get above human level on the Simple Bench benchmark before September 1st, 2025.

Top Multi-SWE-bench score in 2025?

Top SWE-Bench Verified score in 2025?

© Manifold Markets, Inc.•Terms•Privacy