Top Multi-SWE-bench score in 2025?
19
10kṀ29k
Dec 31
46.2 %
expected
3%
0 - 19%
40%
20 - 39%
37%
40 - 59%
14%
60 - 79%
7%
80 - 100%

SWE-bench is a great AI benchmark, but it is Python-only. Multi-SWE-bench is the same thing with multiple programming languages: C, C++, Java, JavaScript, TypeScript, Go, Rust.

Claude 3.7 Sonnet based agent achieved a score of 19% in 2025-03-29, which is currently the best score. The score will be rounded. ("Rounding half up" to be exact, see Rounding.)

The resolution will be primarily from the official leaderboard, but other announcements from reputable organizations will be considered.

See also /SG/top-swebench-verified-score-in-2025

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy