According to a majority of the benchmarks in the next Claude Sonnet's system card that 4.5 Opus was also evaluated on
Resolves NO if it's better on exactly 50% of benchmarks
N/As if Anthropic does not release another Claude Sonnet model by EOY 2027
Update 2026-02-17 (PST) (AI summary of creator comment): The creator will go through every benchmark in the system card (not just software engineering benchmarks), classify each as "software engineering" or "not software engineering", and resolve based on the majority of software engineering benchmarks where Sonnet performs better than Opus 4.5.
🏅 Top traders
| # | Trader | Total profit |
|---|---|---|
| 1 | Ṁ1,068 | |
| 2 | Ṁ373 | |
| 3 | Ṁ212 | |
| 4 | Ṁ127 | |
| 5 | Ṁ85 |
People are also trading
@Simon74fe Based on Introducing Sonnet 4.6 \ Anthropic Sonnet 4.6 is worse on SWE-bench Verified, and is clearly positioned as a worse but cheaper model that "approaches Opus-level intelligence at a price point that makes it more practical," although I have no idea what benchmarks the creator intends either. I would trade lower but it's a bit too risky for me(10k net worth), if anybody wants to give me exit liquidity at 25% I'll take it though (Unless the creator specifies what the market intends to resolve to).
@Dssc I was going to go one by one through every benchmark in the system card, classify them as "software engineering" or "not software engineering", and then resolve based on the majority. But at a glance I don't see a single SWE benchmark that Sonnet beats Opus 4.5 on in the card at all...
@SaviorofPlant It does win on OpenRCA and CyberGym (but still clearly loses overall if you only consider SWE)