How much FLOP will be used to train Llama3?
4
145
865
2026
0.8%
<1e24
0.9%
[1e24, 3e24)
8%
[3e24, 1e25)
17%
[1e25, 3e25)
67%
[3e25, 1e26)
1.3%
[3e26, 1e27)
1.3%
[1e27, 3e27)
1%
[3e27, 1e28)
0.8%
[1e28, 3e28)
0.8%
[3e28, 1e29)
0.7%
>=1e29

Get Ṁ200 play money
Sort by:
sold Ṁ4 [3e26, 1e27) YES

Do you have a specific way you'll be handling the flops across the model sizes? Especially uh, whether the 400B+ model which is coming out in the future but not yet released will be counted.

@YafahEdelman It would be reasonable to resolve to the sum of Flops used

@YafahEdelman I originally meant the largest model in the family on first release, the GPT-4 competitive model. I note that this isn't carefully operationalized in the description. Does anyone have objections to this interpretation/I'm interested in what people thought when they traded.

@NoaNabeshima Sure. The largest is fine.

@NoaNabeshima I was assuming largest

bought Ṁ15 [3e25, 1e26) YES

@YafahEdelman Bro, 70B is already 1e25 flops, what are you betting on?

@Sss19971997 6ND doesn't give that?

@YafahEdelman I think 6nd is off by 40%? I got the number from the report, where they said that their GPUs run at 400 TFlops, and training 70B used 6.4M GPU Hours

sold Ṁ7 [3e24, 1e25) YES

@Sss19971997 6.4M at 400TFlops is giving me 9.216e+24FLOPs and my uncertainty was over what "on first release", but I agree higher is more likely.

@YafahEdelman 405B is likely dense and trained on the 15T tokens generated. Multiplying the number you get by 5-6 seems reasonable for the final answer. Thus, whether using 6ND or from 400TFlops will end up in the 3e25-1e26 range.

@Sss19971997 yeah I was a bit confused about whether 400b+ will count but it does seem pretty likely, hence it's where most of my mana is.