Will Claude Opus 4.5 exceed 80% on SWE-Bench verified?
26
100Ṁ2453
resolved Nov 24
Resolved
YES
45

  • Update 2025-11-05 (PST) (AI summary of creator comment): Resolution will be based on:

    • Minimal agent configuration (as described on SWE-bench verified's website)

    • No parallel test time compute

    • Anthropic's official reporting of the score

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ133
2Ṁ108
3Ṁ100
4Ṁ85
5Ṁ63
Sort by:

@JaundicedBaboon Yah, I don't get it. If anything the closer release rumors of Opus 4.5 should lower expectation of this score and instead market going opposite direction.


A 2.8% jump in 2 months is somewhat faster than progress rate over 2nd half of this year. (~1.2% a month). Not only that, but a YOLO type release would be expected to show less progress compared to a well timed one (Opus 4.1 pulled only 74.5% for under 1% a month of progress).

My expectation is ~79% for a release this week.

@Usaar33 Keep in mind Claude Opus 4 scored lower on SWE-bench than Sonnet 4. I wouldn't be surprised if Opus doesn't even get 78%.

What sources will you use for resolution? Will the score with parallel test time compute be evaluated, or something more like a minimal agent as described on the SWE bench verified's website?

@BenAybar minimal agent, no parallel compute. Will resolve per Anthropic’s reporting

@JaundicedBaboon So sonnet 4.5's score under this standard would have been 77.2%, just to be sure I understand the resolution criteria

© Manifold Markets, Inc.TermsPrivacy