In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?
2
225Ṁ33
Dec 31
February 1, 2029
8%
2025
14%
2026
14%
2027
14%
2028
13%
2029
13%
2030
13%
2031
13%
2032

Background

SWE-bench Verified is a 500-task, human-vetted slice of the SWE-bench dataset that removes ambiguous or unsolvable issues. Each task corresponds to a real GitHub bug-fix; success is measured only by whether the submitted patch makes all repository tests pass. Reaching 95 % would imply an agent that can reliably read unfamiliar codebases, localise bugs, implement multi-file patches, and satisfy rigorous unit tests—approaching or surpassing strong human-engineer performance on day-to-day bug-fixing.

Resolution Criteria

This market resolves to the year-bracket in which a fully automated AI system first records an average accuracy of 95 % or higher on the SWE-bench Verified benchmark.

  • Verification – The claim must be confirmed by either

    1. a peer-reviewed paper on arXiv, or

    2. an official entry on the public SWE-bench leaderboard (e.g. swebench official website or the HAL leaderboard or another credible source).

  • Compute resources – Unlimited.

Fine Print:

If the resolution criteria are unsatisfied by Jan 1, 2033 the market resolves to “Not Applicable.”

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy