Skip to main content
MANIFOLD
Will Deepseek V4 outperform OpenAI and Anthropic models at coding?
22
Ṁ100Ṁ1.1k
Dec 31
3%
chance

Claim: https://x.com/petergostev/status/2009616928763981963

Will Deepseek V4 outperform OpenAI's and Anthropic's strongest contemporary models at the time of its release?

Relevant coding benchmarks:

  • SWE-bench Verified

  • HumanEval

  • TerminalBench

  • RE-Bench

  • LiveCodeBench

Deepseek V4 must score higher than both OpenAI's and Anthropic's strongest latest released models on 3/5 of these benchmarks (official or independent benchmark results) to resolve YES. If V4 matches or underperforms either of its competitors on more than half of those benchmarks, it resolves NO. If a certain benchmark is not reported within 1 month of release, that benchmark counts as a loss for Deepseek V4.

  • Update 2026-04-25 (PST) (AI summary of creator comment): The creator intends to resolve this market NO, noting that RE-Bench and HumanEval are not consistently being reported for new frontier models, and that DeepSeek likely does not beat Opus 4.7 at coding.

Market context
Get
Ṁ1,000
to start trading!
Sort by:

Unfortunately, it looks like RE-Bench and HumanEval are not consistently being reported for new frontier models. Even giving DeepSeek the benefit of the doubt, it likely doesn't beat Opus 4.7 at coding.

I intend to resolve this market NO unless there are objections.

For future markets like this, I will elect to resolve based on the popular benchmarks at resolution date.

I think LiveCodeBench definitely, rest not sure

@clementdupOz DeepSeek wins on LiveCodeBench and was close on SWE-Bench Verified!