
Recently there has been a debate about how many GPUs DeepSeek uses in the training of its language models. The DeepSeek-v3 paper claims that only 2048 NVIDIA H800s were used[1], but others claim that they might have had as many as 50,000 H100s[2] (note: the H100 is the default GPU, the H800 is a gimped version of the H100 to comply with export controls).
The market will resolve NO if either:
DeepSeek-v3's performance is successfully replicated using no more than 2x the claimed compute budget
There is insufficient evidence to conclude that DeepSeek misrepresented their compute usage at market close (default NO)
The market will resolve YES if there is widespread agreement in the AI community at market close that DeepSeek used significantly more compute resources than claimed in their technical report.
I will not bet in this market.
[1] DeepSeek-V3 Technical Report
https://arxiv.org/abs/2412.19437
[2] CEO of Scale AI claiming DeepSeek has access to 50,000 H100s
https://youtu.be/x9Ekl9Izd38?si=yqstFkBxP9ICnxf_&t=170
Update 2025-27-01 (PST) (AI summary of creator comment): Clarification on "used":
"Used" refers exclusively to the main training run of DeepSeek-v3
It includes the number of concurrent GPUs employed during the main training process
🏅 Top traders
| # | Trader | Total profit |
|---|---|---|
| 1 | Ṁ4,228 | |
| 2 | Ṁ2,666 | |
| 3 | Ṁ655 | |
| 4 | Ṁ218 | |
| 5 | Ṁ191 |
People are also trading
I'm going to resolve this market NO. There are 2 main reasons:
1. This market is default NO, so in the situation we don't have an answer, it should resolve NO.
2. The constraint that it only refers to the main training run turns out to be much more restrictive than I originally thought. I (and many others) were under the impression that the majority of compute spend at labs were on giant training runs, but I no longer think this is true. Take for example https://epoch.ai/data-insights/openai-compute-spend, which suggests that OpenAI spent only 400 million training GPT-4.5, but 4.5 billion on research, ablations, etcetera. If DeepSeekV3 was at a similar ratio it's easily possible they had 20k GPUs but only used 2k on the training run.
So it's weird, personally I'm convinced that they had a large # of GPUs, certainly more than 2048, but they never claimed to only have 2048, they claimed to use 2048 in the main training run. The (social) media said that they claimed 2048, but DeepSeek never did. I do think there's something deceptive about not clearing up the fact that they (DeepSeek) had a lot more than they let on, but I don't think that's enough to resolve this specific market YES.
One final piece of evidence pointing towards NO: https://www.darioamodei.com/post/on-deepseek-and-export-controls
In Dario's blog, he claims that "Claude 3.5 Sonnet is a mid-sized model that cost a few $10M's to train (I won't give an exact number)". In that view, if there were some efficiency improvements (like Dual Pipe!) it's possible that with efficiency improvements the cost came down to the same cost as 2048 GPUs.
Deep Research on the market: https://chatgpt.com/s/t_6956cbd2e4f88191a29b93a81a9b196e
I believe that deepseek's paper does not actually say the number of gpus but instead says ~approximately how many hours h800 GPU time it would have required (and explains that this would be about two months in 2048 of those). I think that 1 month on twice as many GPUs (or h100s, which have equivalent flops) would be consistent with what they said. Would evidence indicating this is what they did result in a Yes resolution?
@Fay42 I think that would result in a No resolution. What I care about really is the compute budget, not the precise number of GPUs. This was clear in the resolution criteria but I will change the top-line question wording to be more clear.
What does "used" mean? As DeepSeek itself acknowledges, the compute cost of the final training run for V3 doesn't include the full cost of compute to run experiments and synthetic data, etc. Are we just talking about the main training run here?
"Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data."
@JoshYou Great question, my intention is that "used" does refer to just the main training run here, or the # of concurrent GPUs used for the main training run.
