Update 2025-11-05 (PST) (AI summary of creator comment): Resolution will be based on: Minimal agent configuration (as described on SWE-bench verified's website) No parallel test time compute Anthropic's official reporting of the score

Yes — resolved on Nov 24, 2025 by Manifold Markets prediction market.

MANIFOLD

Will Claude Opus 4.5 exceed 80% on SWE-Bench verified?

Ṁ100Ṁ2.5k

resolved Nov 24

Resolved

YES

ALL

Update 2025-11-05 (PST) (AI summary of creator comment): Resolution will be based on:
- Minimal agent configuration (as described on SWE-bench verified's website)
- No parallel test time compute
- Anthropic's official reporting of the score

Market context

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ133
2		Ṁ108
3		Ṁ100
4		Ṁ85
5		Ṁ63

People are also trading

Will Claude Sonnet 5 exceed 85% on SWE-bench verified?

94% chance

A 70B model beats Opus 4.7 on LMArena

1/24/27

Will Anthropic’s next Sonnet model exceed 65% on terminal bench?

3% chance

In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?

8/1/26

AI resolves at least X% on SWE-bench without any assistance, by 2028?

AI resolves at least X% on SWE-bench WITH assistance, by 2028?

Sort by:

bought Ṁ394 YES

80.9% https://www.anthropic.com/news/claude-opus-4-5

Surprised this is 80%. Is everyone thinking that Anthropic will manipulate the evals due to facing pressure from Google?

https://manifold.markets/JaundicedBaboon/will-claude-opus-45-achieve-a-sota Made a

similar market about swe-rebench to test this

Will Claude Opus 4.5 achieve a SOTA score on SWE-rebench when it is first evaluated?

50% chance. Resolves when Claude Opus 4.5 is evaluated and its score is visible on https://swe-rebench.com/

@JaundicedBaboon Yah, I don't get it. If anything the closer release rumors of Opus 4.5 should lower expectation of this score and instead market going opposite direction.

A 2.8% jump in 2 months is somewhat faster than progress rate over 2nd half of this year. (~1.2% a month). Not only that, but a YOLO type release would be expected to show less progress compared to a well timed one (Opus 4.1 pulled only 74.5% for under 1% a month of progress).

My expectation is ~79% for a release this week.

@Usaar33 Keep in mind Claude Opus 4 scored lower on SWE-bench than Sonnet 4. I wouldn't be surprised if Opus doesn't even get 78%.

What sources will you use for resolution? Will the score with parallel test time compute be evaluated, or something more like a minimal agent as described on the SWE bench verified's website?