What will be the maximum achievable flop utilization on the next generation of Nvidia server chips?

Ṁ1.7kṀ1.5k

resolved Jan 13

100%32%

70-80%

0.9%

<30%

1.2%

30-40%

40-50%

50-60%

33%

60-70%

80-90%

12%

90-100%

Concretely, what is the best FLOPS (Floating Point Operations per Second) the next generation of Nvidia server cards will be able to achieve on fp16 matrix multiplications on matrices generated by the normal distribution, divided by the maximum theoretical FLOPS that Nvidia reports?

For example, for A100s, it's possible to achieve about 280+ TeraFLOPS out of a maximum of 312 TeraFLOPS, for a maximum flop utilization of ~90%.

On H100s, it seems to be closer around 700 TeraFLOPS, out of a maximum of 1000.

Will resolve when values seem clear after HNext cards are released, or a maximum of one year after Nvidia announces it.

To clarify, this will be for the B100, not the B200.

Market context

Technology

Nvidia

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ271
2		Ṁ108
3		Ṁ107
4		Ṁ92
5		Ṁ39

3 Comments

34 Holders

77 Trades

Sort by:

According to https://github.com/stas00/ml-engineering/tree/master/compute/accelerator#maximum-achievable-matmul-flops-comparison-table it's around 77.6%

ml-engineering/compute/accelerator at master · stas00/ml-engineering

Machine Learning Engineering Open Book. Contribute to stas00/ml-engineering development by creating an account on GitHub.

I wouldn’t have a means to test this, but I wonder if the answer could be over 100% using liquid nitrogen and heavy overclocking.

To clarify, I will be basing this off the standard configuration (i.e. the listed 700W in their spec). If Nvidia sells an unusual spec with a higher power limit, I won't be using that to resolve the market.