Related questions
@YafahEdelman I originally meant the largest model in the family on first release, the GPT-4 competitive model. I note that this isn't carefully operationalized in the description. Does anyone have objections to this interpretation/I'm interested in what people thought when they traded.
@YafahEdelman I think 6nd is off by 40%? I got the number from the report, where they said that their GPUs run at 400 TFlops, and training 70B used 6.4M GPU Hours
@Sss19971997 6.4M at 400TFlops is giving me 9.216e+24FLOPs and my uncertainty was over what "on first release", but I agree higher is more likely.
@YafahEdelman 405B is likely dense and trained on the 15T tokens generated. Multiplying the number you get by 5-6 seems reasonable for the final answer. Thus, whether using 6ND or from 400TFlops will end up in the 3e25-1e26 range.
@Sss19971997 yeah I was a bit confused about whether 400b+ will count but it does seem pretty likely, hence it's where most of my mana is.