Will GPT-6 be trained with Mixture-of-Depths?
9
200Ṁ212
2027
24%
chance

Mixture-of-Depths:

https://arxiv.org/abs/2404.02258

Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate FLOPs (or compute) to specific positions in a sequence, optimising the allocation along the sequence for different layers across the model depth.

See also:

/Bayesian/will-the-best-ai-model-according-to

/Bayesian/will-the-best-ai-model-according-to-21102e9462c8

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy