Will Claude Opus 4.5 exceed 80% on SWE-Bench verified?
3
100Ṁ125
2027
24%
chance

  • Update 2025-11-05 (PST) (AI summary of creator comment): Resolution will be based on:

    • Minimal agent configuration (as described on SWE-bench verified's website)

    • No parallel test time compute

    • Anthropic's official reporting of the score

Get
Ṁ1,000
to start trading!
Sort by:

What sources will you use for resolution? Will the score with parallel test time compute be evaluated, or something more like a minimal agent as described on the SWE bench verified's website?

@BenAybar minimal agent, no parallel compute. Will resolve per Anthropic’s reporting

@JaundicedBaboon So sonnet 4.5's score under this standard would have been 77.2%, just to be sure I understand the resolution criteria

© Manifold Markets, Inc.TermsPrivacy