MANIFOLD
Will the next Claude Sonnet be better than Claude 4.5 Opus at software engineering?
29
Ṁ1kṀ13k
resolved Feb 17
Resolved
NO

According to a majority of the benchmarks in the next Claude Sonnet's system card that 4.5 Opus was also evaluated on

Resolves NO if it's better on exactly 50% of benchmarks

N/As if Anthropic does not release another Claude Sonnet model by EOY 2027

  • Update 2026-02-17 (PST) (AI summary of creator comment): The creator will go through every benchmark in the system card (not just software engineering benchmarks), classify each as "software engineering" or "not software engineering", and resolve based on the majority of software engineering benchmarks where Sonnet performs better than Opus 4.5.

Market context
Get
Ṁ1,000
to start trading!

🏅 Top traders

#TraderTotal profit
1Ṁ1,068
2Ṁ373
3Ṁ212
4Ṁ127
5Ṁ85
Sort by:

@creator Which benchmarks count as "software engineering"?

@Simon74fe Based on Introducing Sonnet 4.6 \ Anthropic Sonnet 4.6 is worse on SWE-bench Verified, and is clearly positioned as a worse but cheaper model that "approaches Opus-level intelligence at a price point that makes it more practical," although I have no idea what benchmarks the creator intends either. I would trade lower but it's a bit too risky for me(10k net worth), if anybody wants to give me exit liquidity at 25% I'll take it though (Unless the creator specifies what the market intends to resolve to).

@Dssc I was going to go one by one through every benchmark in the system card, classify them as "software engineering" or "not software engineering", and then resolve based on the majority. But at a glance I don't see a single SWE benchmark that Sonnet beats Opus 4.5 on in the card at all...

bought Ṁ7,549 NO

@SaviorofPlant yeah after going through all of them, sonnet wins on 0%

@SaviorofPlant It does win on OpenRCA and CyberGym (but still clearly loses overall if you only consider SWE)

© Manifold Markets, Inc.TermsPrivacy