Recently there has been a debate about how many GPUs DeepSeek uses in the training of its language models. The DeepSeek-v3 paper claims that only 2048 NVIDIA H800s were used[1], but others claim that they might have had as many as 50,000 H100s[2] (note: the H100 is the default GPU, the H800 is a gimped version of the H100 to comply with export controls).
The market will resolve NO if either:
DeepSeek-v3's performance is successfully replicated using no more than 2x the claimed compute budget
There is insufficient evidence to conclude that DeepSeek misrepresented their compute usage at market close (default NO)
The market will resolve YES if there is widespread agreement in the AI community at market close that DeepSeek used significantly more compute resources than claimed in their technical report.
I will not bet in this market.
[1] DeepSeek-V3 Technical Report
https://arxiv.org/abs/2412.19437
[2] CEO of Scale AI claiming DeepSeek has access to 50,000 H100s
https://youtu.be/x9Ekl9Izd38?si=yqstFkBxP9ICnxf_&t=170
Update 2025-27-01 (PST) (AI summary of creator comment): Clarification on "used":
"Used" refers exclusively to the main training run of DeepSeek-v3
It includes the number of concurrent GPUs employed during the main training process
I believe that deepseek's paper does not actually say the number of gpus but instead says ~approximately how many hours h800 GPU time it would have required (and explains that this would be about two months in 2048 of those). I think that 1 month on twice as many GPUs (or h100s, which have equivalent flops) would be consistent with what they said. Would evidence indicating this is what they did result in a Yes resolution?
@Fay42 I think that would result in a No resolution. What I care about really is the compute budget, not the precise number of GPUs. This was clear in the resolution criteria but I will change the top-line question wording to be more clear.
What does "used" mean? As DeepSeek itself acknowledges, the compute cost of the final training run for V3 doesn't include the full cost of compute to run experiments and synthetic data, etc. Are we just talking about the main training run here?
"Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data."
@JoshYou Great question, my intention is that "used" does refer to just the main training run here, or the # of concurrent GPUs used for the main training run.