SOTA on a SWE-bench [Assisted] in October 2024

The SWE-bench is a benchmark developed to evaluate if language models can resolve real-world GitHub issues. The leaderboard showcases various models and their performances in terms of the percentage of SWE-bench instances they resolved. Each instance in the SWE-bench represents a GitHub issue. The leaderboard is categorized into two main sections: Unassisted and Assisted.

Assisted: In this category, models are evaluated with the "oracle" retrieval setting. This setting provides the model with the correct files to edit, allowing the benchmark to primarily focus on a model's patch generation ability.

This question is only about the Assisted category of this benchmark.

http://www.swebench.com/#
Current SOTA is <5%

The prediction market will resolve based on the SWE-bench leaderboard standings as of 11th October 2024.

In the extremely unlikely case that the number would fit in two intervals, the lowest will be chosen.

#	Name	Total profit
1		Ṁ240
2		Ṁ131
3		Ṁ89
4		Ṁ41
5		Ṁ40

🏅 Top traders

Related questions