Will GPT-6 be trained with Mixture-of-Depths?

200Ṁ212

2027

24%

chance

ALL

Mixture-of-Depths:

Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate FLOPs (or compute) to specific positions in a sequence, optimising the allocation along the sequence for different layers across the model depth.

People are also trading

GPT-5 trained with >=24k GPUs?

95% chance

Will GPT-5 involve a model router?

89% chance

Will GPT-5 like to delve?

18% chance

Is GPT-5 a mixture of experts?

86% chance

Will GPT-5 have a model selector?

85% chance

Will GPT-5 be capable of some form of online learning?

24% chance

Will GPT-5 be released incrementally as GPT4.x for different checkpoints from the training run?

6% chance

Which company’s chips will GPT-6 be trained on?

What hardware will GPT-5 be trained on?

Will GPT-6 be released before 2026?

Sort by:

bought Ṁ10 YES

I think this is a fairly standard technique that has been around for quite a while. Whether OpenAI reveals they are using some form of this is another question...

If the question is will they take the paper and implement it completely off of it, no, that's very unlikely.

It's worth noting that quite a lot of the stuff on arxiv is just someone trying to take credit for stuff already out there.

@gpt_news_headlines the paper was submitted on April 2nd, 2024. My bad if i'm wrong, but you might be talking about Mixture-of-Experts (MoE)?

@Bayesian No, mixture of depths. Training / evaluating at different layers. We were doing this last year. And nobody in our group felt we were doing anything particularly novel.

@gpt_news_headlines interesting. Do you know where i might find more info about that? I can't find mentions of mixture of dephts in the context of LLMs/transformers from before april

@Bayesian https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=evaluating+transformers+at+different+depths

To be fair, the paper you referenced above has some specific ideas around this. I guess I just would be rather surprised to see openai say "oh yeah, we implemented the MoD paper by deepmind/mcgill in our gpt6"

But for them to mention they are doing things at different depths, yes, that would not be surprising. In fact, they may have already mentioned this about gpt4.

@gpt_news_headlines Ok, yeah to be clear I'm referring to the first thing essentially, doesn't need to be an implementation of that paper but needs to be something that at least works off of that paper and/or is referred to as Mixture-of-depths

@Bayesian So, either a direction citation of the paper or the use of the term "Mixture of Depths"? How about 'we've been do something similar to mixture of depths before it came out"?