SOTA on a SWE-bench [Unassisted] in October 2024
8
400
930
Oct 11
2%
90-100%
3%
75-90%
3%
60-75%
14%
35-60%
29%
15-35%
49%
0-15%

The SWE-bench is a benchmark developed to evaluate if language models can resolve real-world GitHub issues. The leaderboard showcases various models and their performances in terms of the percentage of SWE-bench instances they resolved. Each instance in the SWE-bench represents a GitHub issue. The leaderboard is categorized into two main sections: Unassisted and Assisted.

  • Unassisted: In this category, models are evaluated without any assistance. This means that they don't have the advantage of the "oracle" retrieval setting where the correct files to edit are directly given to them.

This question is only about the Unassisted category of this benchmark.

http://www.swebench.com/#
Current SOTA is <2%

The prediction market will resolve based on the SWE-bench leaderboard standings as of 11th October 2024.

In the extremely unlikely case that the number would fit in two intervals, the lowest will be chosen.

Get Ṁ200 play money

More related questions