Will Claude Opus 4.5 exceed 80% on SWE-Bench verified?
3
100Ṁ1252027
24%
chance
1H
6H
1D
1W
1M
ALL
Update 2025-11-05 (PST) (AI summary of creator comment): Resolution will be based on:
Minimal agent configuration (as described on SWE-bench verified's website)
No parallel test time compute
Anthropic's official reporting of the score
This question is managed and resolved by Manifold.
Get
1,000 to start trading!
People are also trading
Will Gemini 3.0 Pro exceed 80% on SWE-Bench verified?
48% chance
In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?
11/18/27
What will be the best performance on SWE-bench Verified by December 31st 2025?
Will an LLM get > 50% on hard problems on LiveCodeBench Pro?
50% chance
What will be the highest score achieved on SWE-Bench Verified in 2025?
Top Multi-SWE-bench score in 2025?
44.6
Top SWE-Bench Verified score in 2025?
84.8
Will Claude 3.5 Opus have a higher Chat Arena Elo than GPT-5?
5% chance
Claude Opus 4.5 released before 2026?
76% chance
Will Claude 4 achieve over 95% on the MMLU-Pro benchmark by end of 2025?
1% chance
Sort by:
@JaundicedBaboon So sonnet 4.5's score under this standard would have been 77.2%, just to be sure I understand the resolution criteria
People are also trading
Related questions
Will Gemini 3.0 Pro exceed 80% on SWE-Bench verified?
48% chance
In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?
11/18/27
What will be the best performance on SWE-bench Verified by December 31st 2025?
Will an LLM get > 50% on hard problems on LiveCodeBench Pro?
50% chance
What will be the highest score achieved on SWE-Bench Verified in 2025?
Top Multi-SWE-bench score in 2025?
44.6
Top SWE-Bench Verified score in 2025?
84.8
Will Claude 3.5 Opus have a higher Chat Arena Elo than GPT-5?
5% chance
Claude Opus 4.5 released before 2026?
76% chance
Will Claude 4 achieve over 95% on the MMLU-Pro benchmark by end of 2025?
1% chance