GPT-4 #5: Will GPT-4 be a dense model?

Conditional on being a transformer.

Nov 25, 11:11pm: Will GPT-4 be a dense model? → GPT-4 #5: Will GPT-4 be a dense model?

Sort by:
HoraceHe avatar
Horace Heis predicting NO at 28%

Why’d this market suddenly drop so much?

ValeryCherepanov avatar
Valery Cherepanovis predicting YES at 28%

@HoraceHe somebody sold "YES" shares. Also, some people bought "NO".

JacyAnthis avatar
Jacy Reese Anthisis predicting YES at 28%

@HoraceHe Personally, I want the liquidity for March 31st resolutions in other markets.

JacyAnthis avatar
Jacy Reese Anthisbought Ṁ56 of YES

@vluzko "dense" here refers the standard machine learning usage of mostly non-zero parameters, right?

Mason avatar
GPT-PBotbought Ṁ10 of YES

Meta's future looks grim,
As layoffs start to skim.
Numbers dwindling, profits slim,
More layoffs seem quite prim.

JacyAnthis avatar
Jacy Reese Anthisis predicting YES at 44%

Conspiracy theory: OpenAI may be choosing not to reveal GPT-4's architecture not just because of continuously increasing stakes of safety and profit but because there was some major architectural shift. Given that there none of the recent innovations seem super promising (e.g., autoregressive diffusion models), this shift may have been to something already well-established—but not frequently used in LLMs—like spare encoding or mixture of experts.

vluzko avatar
Vincent Luczkow

It seems very unlikely that the release of GPT-4 will resolve this. However the market does not close until 2027, and I will leave it open until then in case the information is either (credibly) leaked or OpenAI decides to release it after the fact. If neither of those occurs the market will resolve N/A.

wadimiusz avatar
Vadimis predicting YES at 40%

I don't see architecture details in the blog post or the technical report.

firstuserhere avatar
firstuserhereis predicting YES at 40%

@wadimiusz "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar."

firstuserhere avatar
firstuserhereis predicting YES at 51%

MOE market


EdwardKmett avatar
Edward Kmettbought Ṁ148 of YES

@firstuserhere MOE money, MOE problems.

AlexAmadori avatar
Alex Amadoribought Ṁ18 of NO

thank you guys for continuing to create arb opportunities by making the probabilities of this question + the mixture of expert question sum to more than 100%

ValeryCherepanov avatar
Valery Cherepanovis predicting YES at 51%

@AlexAmadori no problem

EdwardKmett avatar
Edward Kmettis predicting YES at 52%

@AlexAmadori Given they have different end-dates, you may find yourself with delicate hedging decisions to make before July.

TamayBesiroglu avatar
Tamay Besiroglubought Ṁ100 of NO

I'm confused why the market expects GPT-4 to be a Mixture of Experts model. I think it's ~85% likely that GPT-4 will be a dense model.

  • As far as I know, OpenAI has not described training large-scale MoE models. If OpenAI were to make a huge bet on MoE, we should expect them to first derisk this bet by training and validating smaller MoE models before spending millions of dollars worth of compute on a MoE GPT-4

  • All prior GPT models (1 through 3.5) were dense. OpenAI is bullish on scaling dense LLMs, and has not, as far as I know, indicated an intention to abort this strategy of scaling dense LLMs.

  • Dense models are the most common type of model among the largest-scale experiments. As far as I can tell, the largest-scale MoE LLM (by compute) was Switch Transformers, which at ~2.8e22 FLOP is quite small compared to the largest models trained by Google, Meta, Microsoft, and so on. Since GPT-4 is likely to involve a lot of compute, we should expect it to look be similar in many respects to PaLM, Megatron-Turing NLG, OPT-175B, Gopher, Chinchilla, LaMDA, Bloom, etc.

vluzko avatar
Vincent Luczkow

@TamayBesiroglu I think the market is being driven by people who have 1. heard rumors that GPT-4 is 10T parameters or whatever 2. also heard that 10T parameter dense models are totally bonkers 3. concluded that it must be MoE.
Before the 3.5 release it was a lot more reasonable to guess that maybe it would be MoE, but if they were switching to MoE why would they burn a ton of money training a dense 3.5? So really the question is whether OpenAI has learned in the last year that MoE is better, which seems pretty unlikely.

TamayBesiroglu avatar
Tamay Besirogluis predicting YES at 58%

@vluzko Fair. A while back, I asked Sam Altman how many parameters GPT-4 would have, and he said roughly that it wouldn't be much larger than GPT-3.

quadrilateral avatar
quadrilateralis predicting NO at 53%

@vluzko no reason to assume 3.5 is dense

yaboi69 avatar

@quadrilateral It can be a little dense at times

vluzko avatar
Vincent Luczkow

@quadrilateral we have every reason to believe that 3.5 is 3 trained with closer to optimal tokens

quadrilateral avatar
quadrilateralis predicting NO at 54%

@vluzko trust the process

HoraceHe avatar
Horace Hebought Ṁ200 of NO

Can we get a more precise definition for whether a model is dense or not?

jonsimon avatar
Jon Simonis predicting YES at 57%

@HoraceHe Not a mixture-of-experts, primarily

typedfemale avatar


NoaNabeshima avatar
Noa Nabeshimais predicting YES at 78%

I'm betting YES partially to hedge against another market I have a large position in that is closed.

NoaNabeshima avatar
Noa Nabeshimabought Ṁ600 of YES

@NoaNabeshima wait this makes no sense

ZZZZZZ avatar
ZZZ ZZZis predicting NO at 61%

@NoaNabeshima why not?

NoaNabeshima avatar
Noa Nabeshimais predicting YES at 61%

@ZZZZZZ I was confused, it does make sense.

duck avatar
crystal ballis predicting NO at 58%

What if the model will have "global memory" akin to the one described in the RETRO paper? It isn't triggered every token inference and so the whole NN won't be run every token. Would it then be considered sparse?

vluzko avatar
Vincent Luczkow

@duck No, that's still dense.

NoaNabeshima avatar
Noa Nabeshimais predicting YES at 63%

If GPT-4 has a parameter that is always used for sufficiently long sequence lengths but not used on shorter sequence lengths, is it considered dense?

NoaNabeshima avatar
Noa Nabeshimais predicting YES at 63%

@NoaNabeshima oh, I guess this is trivially the case for position embeddings

EricJang avatar
Eric Jang

Is causal convolution considered sparse or dense?

L avatar
Lis predicting NO at 62%

@EricJang dense

NoaNabeshima avatar
Noa Nabeshima

Does GPT-3 count as a dense model? Or are you asking if all parameters are necessarily involved in computing a forward pass w/ batch size 1?


vluzko avatar
Vincent Luczkow

@NoaNabeshima Yeah GPT-3 counts as dense. This is about whether it will be a mixture of experts.

L avatar
Lis predicting NO at 54%

is gpt3 even dense anymore?

NikitaBrancatisano avatar
Nikita Brancatisanobought Ṁ100 of NO

Reddit comment screenshot. Credit: Igor Baikov (shared by Gwern)