Will GPT-6 be trained with Mixture-of-Depths?
9
200Ṁ2122027
24%
chance
1H
6H
1D
1W
1M
ALL
Mixture-of-Depths:
https://arxiv.org/abs/2404.02258
Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate FLOPs (or compute) to specific positions in a sequence, optimising the allocation along the sequence for different layers across the model depth.
See also:
This question is managed and resolved by Manifold.
Get
1,000 to start trading!
People are also trading
Related questions
Will GPT-5 like to delve?
34% chance
Which company’s chips will GPT-6 be trained on?
What hardware will GPT-5 be trained on?
GPT-5 trained with >=24k GPUs?
82% chance
GPT-4 #5: Will GPT-4 be a dense model?
1% chance
Will GPT-6 be released before 2026?
9% chance
Will manifold be part of GPT5's training data?
76% chance
Is GPT-5 a mixture of experts?
79% chance
Will GPT-6 be released before 2025?
3% chance
Will GPT-5 be capable of some form of online learning?
28% chance