What will be the best score on the SWE-Bench (unassisted) benchmark before 2025?

9

160Ṁ331

resolved Jan 29

Resolved as

29%

1H

6H

1D

1W

1M

ALL

This question will resolve as the state-of-the-art accuracy on the SWE-Bench unassisted benchmark by an AI system, including any post-training enhancements but excluding any human assistance. This will be based on credible publicly available results prior to January 1st 2025. The primary credible source will be the official leaderboard, but other sources, including but not limited to arXiv preprints and papers, may also be considered.

Background information:

See SWE-bench.

SWE-bench is a dataset that tests systems' ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. Read more about SWE-bench in our paper!
Best reported system on March 15th 2024 is Devin achieving 13.86%. The official best on the official leaderboard is Claude 2 + BM25 Retrieval with 1.96%.

Part of the AI Benchmarks series by the AI Safety Student Team at Harvard on evaluations of AI models against technical benchmarks. Full list of questions:

Technical AI Timelines

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ5
2		Ṁ2
3		Ṁ2
4		Ṁ1
5		Ṁ1

People are also trading

Will any model get above human level on the Simple Bench benchmark before September 1st, 2025.

-8% 1d5% chance

In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?

Will I be able to bench my own weight by the end of 2025?

-12% 1d60% chance

What will be the best performance on SWE-bench Verified by December 31st 2025?

What will be the highest score achieved on SWE-Bench Verified in 2025?

Top Multi-SWE-bench score in 2025?

Top SWE-Bench Verified score in 2025?

What will be the best score on Cybench by December 31st 2025?

What will be the best score (5/5 reliability) on ZeroBench by December 31st 2025?

What will be the best normalized score achieved on the original 7 RE-Bench tasks by December 31st 2025?

Related questions

Will any model get above human level on the Simple Bench benchmark before September 1st, 2025.

In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?

Will I be able to bench my own weight by the end of 2025?

What will be the best performance on SWE-bench Verified by December 31st 2025?

What will be the highest score achieved on SWE-Bench Verified in 2025?

Top Multi-SWE-bench score in 2025?

Top SWE-Bench Verified score in 2025?

What will be the best score on Cybench by December 31st 2025?

What will be the best score (5/5 reliability) on ZeroBench by December 31st 2025?

What will be the best normalized score achieved on the original 7 RE-Bench tasks by December 31st 2025?

© Manifold Markets, Inc.•Terms•Privacy