Claim: https://x.com/petergostev/status/2009616928763981963
Will Deepseek V4 outperform OpenAI's and Anthropic's strongest contemporary models at the time of its release?
Relevant coding benchmarks:
SWE-bench Verified
HumanEval
TerminalBench
RE-Bench
LiveCodeBench
Deepseek V4 must score higher than both OpenAI's and Anthropic's strongest latest released models on 3/5 of these benchmarks (official or independent benchmark results) to resolve YES. If V4 matches or underperforms either of its competitors on more than half of those benchmarks, it resolves NO. If a certain benchmark is not reported within 1 month of release, that benchmark counts as a loss for Deepseek V4.
Update 2026-04-25 (PST) (AI summary of creator comment): The creator intends to resolve this market NO, noting that RE-Bench and HumanEval are not consistently being reported for new frontier models, and that DeepSeek likely does not beat Opus 4.7 at coding.
Unfortunately, it looks like RE-Bench and HumanEval are not consistently being reported for new frontier models. Even giving DeepSeek the benefit of the doubt, it likely doesn't beat Opus 4.7 at coding.
I intend to resolve this market NO unless there are objections.
For future markets like this, I will elect to resolve based on the popular benchmarks at resolution date.