Top Multi-SWE-bench score in 2025?
16
10kṀ26k
Dec 31
47.1 %
expected
3%
0 - 19%
39%
20 - 39%
36%
40 - 59%
15%
60 - 79%
7%
80 - 100%

SWE-bench is a great AI benchmark, but it is Python-only. Multi-SWE-bench is the same thing with multiple programming languages: C, C++, Java, JavaScript, TypeScript, Go, Rust.

Claude 3.7 Sonnet based agent achieved a score of 19% in 2025-03-29, which is currently the best score. The score will be rounded. ("Rounding half up" to be exact, see Rounding.)

The resolution will be primarily from the official leaderboard, but other announcements from reputable organizations will be considered.

See also /SG/top-swebench-verified-score-in-2025

Get
Ṁ1,000
to start trading!
Sort by:

Have you tried gemini 2.5 pro experimental on it yet?

@ian The leaderboard on the website shows something with Gemini 2.5 Pro at 21.62%:

https://multi-swe-bench.github.io/#/

(Not sure what Mopenhands is...)

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules