Open numeric answer for number of GPT-4 parameters. Market will resolve 100B if there's fewer than 100B, and 100T if there's more than 100T.
GPT-3 was 175 billion. There have been rumors that GPT-4 will be much bigger.
Similar markets:
https://manifold.markets/MaxGhenis/will-gpt4-have-at-least-100-trillio (requires >100T, which is currently unlikely)
https://manifold.markets/JustinTorre/how-many-parameters-with-gpt4-have (ranges)
Related questions
@andrew It's not accidental. He clearly says:
> The latest, the state-of-the-art OpenAI model, is approximately 1.8 trillion parameters.
Calling it GPT-MoE-1.8T reoccurs multiple times throughout the presentation. Jensen knows what he's talking about, they are the ones providing the hardware for OpenAI.
Source
I think the only uncertainty here is exactly which model he is talking about, but I'd say it's pretty safe to resolve this.
The other nuance is how to deal with the fact that it's MoE. If each of the experts have 200 billion parameters, does this resolve to that?
@Shump Yeah — agreed he knows what he's saying. Question is whether "the latest" is the one released (GPT-4) or the next. Unless consensus here disagrees strongly.
As for how to resolve, I lean pretty solidly towards counting all trained parameters. The result is most clearly not a 200b model — it's a stack of 8 of them, each needing to be trained, and each being used at runtime. The fact that only a subset of experts get activated for a specific token's evaluation doesn't really change that.
Rumors have been coming out, will wait for more confirmation. But looks like the market adjusted.
https://twitter.com/soumithchintala/status/1671267150101721090
@jonsimon Well yes, GPT4 params will probably not release within any reasonable time frame, but it's pretty likely more than 200B lol
@ShadowyZephyr It depend strongly on whether they incorporated chinchilla scaling laws. If they did then 200B is quite plausible
Some evidence that it could be ~1T https://twitter.com/norabelrose/status/1644146837425913856
Oh, he says now it was "a figure of speech" https://twitter.com/SebastienBubeck/status/1644151579723825154
So it's pretty weak evidence but still (also I cannot delete previous comment anyway).
https://www.semafor.com/article/03/24/2023/the-secret-history-of-elon-musk-sam-altman-and-openai
Will wait for confirmation / a better citation before resolving the market.
@andrew Yeah, I'd say definitely wait. Or at least figure out where the number came from.
Since the announcement paper didn't give details, I'll leave this open under the expectation that it'll leak eventually. I've seen post-announcement leaks ranging from 80B to 2T, which is still a massive range.
Under the case that there's multiple GPT-4 models, I think the best resolution is using the one that they launched this week. If anyone feels otherwise, let me know.
@NoaNabeshima I don't know exactly how Manifold does these, but if you buy and the prediction goes up (or, resolves higher), you make M$. And vice versa.
So if you think the current level (410B) is high, then you predict lower. The more the market is mispriced, the more you'd win/lose.
@andrew I'm thinking that there's a ~10% probability that GPT-4 falls within a reference class (MoE) with ~40T parameters in expectation, which makes GPT-4's expected number of parameters at least 4T according to my probability distribution. But if Manfiold just pays out based on whether or not the amount is higher/lower than my estimate I shouldn't bet based on the expected number, I should bet based on my median estimate, which is kinda lame.
@NoaNabeshima Based on my P/L on this, it pays out based on how much you were right by (i.e., you can indeed trade based on expected value) — you get shared based on the current price, so the larger a mispricing, the more it'd pay out.
But perhaps someone from Manifold can clarify.
@andrew If I bet M500 on Higher, my max payout is M587, whereas if I bet M500 on lower, the max payout is M1197. I wonder if the market is set so that you bet on e^Expected[ln(params)] or something like that?
"GPT-4 won’t be the largest language model. Altman said it wouldn’t be much bigger than GPT-3. The model will be certainly big compared to previous generations of neural networks, but size won’t be its distinguishing feature. It’ll probably lie somewhere in between GPT-3 and Gopher (175B-280B)."