How much FLOP will be used to train Llama3?

145

Ṁ3.1k

Ṁ865

2026

ALL

0.8%

<1e24

0.9%

[1e24, 3e24)

[3e24, 1e25)

17%

[1e25, 3e25)

67%

[3e25, 1e26)

1.3%

[3e26, 1e27)

1.3%

[1e27, 3e27)

[3e27, 1e28)

0.8%

[1e28, 3e28)

0.8%

[3e28, 1e29)

0.7%

>=1e29

Get Ṁ200 play money

Related questions

Will the 400B+ open source Llama 3 model rank higher than GPT-4-Turbo-2024-04-09 on the lmsys leaderboard?

53% chance

How many FLOPs will go into training the first ASL-3 model?

Will a lab train a >=1e26 FLOP state space model before the end of 2025?

30% chance

Will a machine learning training run exceed 10^26 FLOP in China before 2027?

51% chance

Will a machine learning training run exceed 10^27 FLOP in China before 2028?

43% chance

How much FLOP will be used to train the best language model with freely available weights on July 1, 2025?

Will Llama 3 400B be better than GPT-4?

50% chance

In yottaFLOPs (10^24), how much compute will GPT-4 be trained with?

End of pre-training era for language models: Will an LM fine-tune for more FLOPs than it is pre-trained for, before 2026

22% chance

How many active parameters will the largest Llama 3 have?

Sort by:

sold Ṁ4 [3e26, 1e27) YES

Do you have a specific way you'll be handling the flops across the model sizes? Especially uh, whether the 400B+ model which is coming out in the future but not yet released will be counted.

@YafahEdelman It would be reasonable to resolve to the sum of Flops used

@YafahEdelman I originally meant the largest model in the family on first release, the GPT-4 competitive model. I note that this isn't carefully operationalized in the description. Does anyone have objections to this interpretation/I'm interested in what people thought when they traded.

@NoaNabeshima Sure. The largest is fine.

@NoaNabeshima I was assuming largest

bought Ṁ15 [3e25, 1e26) YES

@YafahEdelman Bro, 70B is already 1e25 flops, what are you betting on?

@Sss19971997 6ND doesn't give that?

@YafahEdelman I think 6nd is off by 40%? I got the number from the report, where they said that their GPUs run at 400 TFlops, and training 70B used 6.4M GPU Hours

sold Ṁ7 [3e24, 1e25) YES

@Sss19971997 6.4M at 400TFlops is giving me 9.216e+24FLOPs and my uncertainty was over what "on first release", but I agree higher is more likely.

@YafahEdelman 405B is likely dense and trained on the 15T tokens generated. Multiplying the number you get by 5-6 seems reasonable for the final answer. Thus, whether using 6ND or from 400TFlops will end up in the 3e25-1e26 range.

@Sss19971997 yeah I was a bit confused about whether 400b+ will count but it does seem pretty likely, hence it's where most of my mana is.