Cloud Sonnet 4.1 vs GPT-5 in vibe coding?
6
100Ṁ60
Aug 31
62%
chance
6

Resolution criteria

  • Resolves YES if, by 11:59 pm ET on December 31, 2025, the “SWE-bench Verified” percentage that Anthropic publicly lists for Claude Sonnet 4 (or any Sonnet 4.x variant, if released) is strictly higher than the “SWE-bench Verified” percentage OpenAI publicly lists for GPT‑5 on their official pages linked below. Otherwise (including a tie or if Sonnet has no published figure), resolves NO.
    Sources to check at resolution time: OpenAI “Introducing GPT‑5” (Evaluations) and Anthropic “The best AI for developers” (SWE‑bench Verified callout). If either page is unavailable at the deadline, use the most recent Internet Archive snapshot captured on or before the deadline. Numbers referring to other Claude families (e.g., Opus) or to non‑“SWE‑bench Verified” variants do not count.

Background

  • OpenAI announced GPT‑5 on August 7, 2025; the page reports GPT‑5 at 74.9% on SWE‑bench Verified (OpenAI notes runs used a fixed n=477 subset). (openai.com)

  • Anthropic markets Claude Sonnet 4 with a 72.7% “SWE‑bench Verified” figure on its developer landing page. Anthropic also announced Opus 4.1 (a different, higher‑tier model) on August 5, 2025, citing gains in coding performance, but this market compares Sonnet vs GPT‑5. (anthropic.com)

  • Naming note: there is Claude Sonnet 4 and Claude Opus 4.1; “Sonnet 4.1” is not an official model name as of August 2025 per Anthropic’s model list. (docs.anthropic.com)

Considerations

  • “SWE‑bench Verified” can appear with different subsets/harnesses across posts; to avoid disputes, this market uses the vendors’ own published “SWE‑bench Verified” percentages on the linked official pages at the deadline, not third‑party leaderboards or agent pipelines. (openai.com, anthropic.com)

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy