SWE-bench is a great AI benchmark, but it is Python-only. Multi-SWE-bench is the same thing with multiple programming languages: C, C++, Java, JavaScript, TypeScript, Go, Rust.
Claude 3.7 Sonnet based agent achieved a score of 19% in 2025-03-29, which is currently the best score. The score will be rounded. ("Rounding half up" to be exact, see Rounding.)
The resolution will be primarily from the official leaderboard, but other announcements from reputable organizations will be considered.
See also /SG/top-swebench-verified-score-in-2025
🏅 Top traders
| # | Trader | Total profit |
|---|---|---|
| 1 | Ṁ4,452 | |
| 2 | Ṁ1,266 | |
| 3 | Ṁ863 | |
| 4 | Ṁ450 | |
| 5 | Ṁ319 |
People are also trading
@SanghyeonSeo this can either N/A or resolve 20-39% right? (No updates, but MopenHands + Gemini-2.5-Pro is listed at 21.62)
@ian The leaderboard on the website shows something with Gemini 2.5 Pro at 21.62%:
https://multi-swe-bench.github.io/#/
(Not sure what Mopenhands is...)