When researching GPT-4 speculations, the estimated parameter count ranged anywhere from 175 billion (same as GPT-3) to over 100 trillion.
This market will resolve YES if GPT-4 have over 1 trillion parameters, otherwise it will resolve NO.
I still think it's unlikely they scaled that much, but after thinking about it, I'd be surprised if they didn't scale a bunch
@L Against my own self-interest as a NO bettor, I made a few calculations and I'm afraid it could be >1T (f*** OpenAI!!!! Couldn't you just go with a big Chinchilla?! No, you had to go & kick PaLM's ass....) However, I could still hope the exact number doesn't get revealed until next year, hehe. I'll think about whether to show my calculations or not. I'll try tobdouble check with a few acquaintances of mine before (they're researchers at Ivy League unis)
@R2D2 what uni's ? Doesn't ivy league suck at ai?
If GPT4 has exactly 1 trillion parameters, this will resolve NO, correct?
100 Trillion is ridiculous for now, maybe in 10 years - and once we've figured out how to avoid evaluating all of them to make a prediction.
I think OpenAI hardware is going to struggle with 1 Trillion.
I estimate that GPT-4 has more in the range of 300-500 billion parameters.
Semafor: The latest language model, GPT-4, has 1 trillion parameters.
They were also the first to report that Bing is GPT-4.
@LeoSpitz No they did not, it was from this Tweet, directly from a VP at Microsoft. https://twitter.com/yusuf_i_mehdi/status/1635733309631389696
@PatrickDelaney They reported that Bing will eventually include GPT-4 without certifying any knowledge that it, "is" GPT-4, which came from the above tweet.
@NexVeridian Right, I know, saw that, thank you...but... "poised to incorporate" is not the same as, "is."
GPT-4 with trillion parameters,
AI language model of great measure,
Will it surpass its former gain?
Better bet on that, my friend, it's insane.
If it can be anywhere from ~0 to 100 trillion, the expected value is 50 trillion. So this market is priced too low.
@Mira Probability within that range is nor evenly distributed. For example, the age of a human can be anywhere between 0 and 122 doesn't mean the expected age of a human would be 61.5 years
@Mira this couldn't be more wrong. 1) interval constraints are not the same as uniform probabilities https://www.stat.berkeley.edu/~stark/Preprints/constraintsPriors13.pdf 2) even if they were (and they aren't), there's no reason to assume a uniform pdf over said interval 3) using non-informative priors when you actually have prior knowledge is a big mistake. And we do have prior knowledge here: we know that Sam Altman debunked the 100 trillion fake news in the StrictlyVC interview. Also, there's no way they could train a 100 trillion attention-based model in Q1-Q2 '22, just on elementary cost considerations alone. So nope, the expected value is not what you claim it to be.
Don’t feed the trolls :)
@JimHays if Mira is a troll, they're throwing away a very large amount of mana.
I just don't get their actions though. If I had more mana I'd bid this down further, but I don't.
@Mira this is... a joke, right?
Mira is trading very reasonably (buying a lot of shares at a reasonable price), and posting a comment that is an obvious troll :)
A model that completed training in August 2022 has > 1T params? Seems unlikely? Also, the "paper" 🤮 seems to indicate that it was trained for way longer than other models, which could suggest a "big Chinchilla" kind of model/training schedule, rather than a "bust PaLM's ass" kind of model/ts
@R2D2 lmao agreed on "paper" barf. It is a 98 page report, half of which is an ad.
@firstuserhere yep, i don't know what info I'm not aware of has pushed all these markets so high but my estimate remains that gpt4 used probably 300-500 billion parameters, no early stopping, same cut off date as 3.5 has but trained for much much longer.
@firstuserhere Yep, if 1) it ever gets submitted somewhere and 2) I happen to be the reviewer, I’m gonna hard-reject it sooner than I can say “ClosedAI”. I’m gonna be like, (Reviewer 2)^512
@firstuserhere I agree, I don’t think it wa bigger than the biggest PaLM
@R2D2 yep yep, using that as my upper bound as well. I've heard multiple OpenAI people's and even saltman's comments here and there which fit in with the estimate
At around 1:46:00 of the microsoft reinventing productivity event, they described their LLM, probably GPT-4, as having "billions" of parameters. Not sure how much info that is.
Brett Winton from ark invest says 80b parameters
@NexVeridian wait, why would anybody think 3.5 turbo is 20b parameters? That would make it 4x smaller than chinchilla and 9x smaller than gpt3. I know the scaling laws have swing more towards data but that's ridiculous.
@ErickBall why else do you think it's so much cheaper?
@ErickBall Chinchilla optimized for training flops, but if you're spending more money on inference than training (which presumably OpenAI is) you'll want to go even smaller. The llama paper made this point. Basically if you look at the empirical data from Chinchilla etc, we haven't saturated the performance of the smaller models, so if you're willing to spend extra training flops you can get more juice out of a smaller model.
@dayoshi well when you put it that way, yeah, that actually makes a ton of sense. Now I don't really understand why this market is so high.
Related polymarket question https://polymarket.com/event/will-gpt-4-have-500b-parameters/will-gpt-4-have-500b-parameters and Manifold mirror:
If GPT-4 is dense and 2 OOMs of compute more than GPT-3 (3.14e23 FLOP according to LambdaLabs), it's just under 1T parameters:
Eyeballing it, it looks like 2.5 OOMs to get 1T if trained Chinchilla-optimally, which GPT-4 might not be because of inference costs?
@NoaNabeshima A little more than 2.5 OOMs
metaculus community prediction assigns prob that it's >2 OOM greater at ~10%
@NoaNabeshima so looks like this is mostly a mixture of experts market?
@NoaNabeshima oh but also they could have separate params for images
@NoaNabeshima this seems surreal
@NoaNabeshima does this figure imply that it was trained with >1000x the compute of any previous openai LLM?
@ErickBall The grey dots probably are newly trained models so probably this doesn't imply that
@NoaNabeshima No, the "paper" explicitly states that this extrapolation was performed before end of training, abs training completed in 2022, so they're not newly trained models
I've recently flipped from LONG to SHORT because of this: https://www.theverge.com/23560328/openai-gpt-4-rumor-release-date-sam-altman-interview
This guy predicted the release date and the multimodality. I'm gonna trust them on the parameter count.
@rockenots Tweet that leaked the release date and multimodality: https://twitter.com/apples_jimmy/status/1629939273469394945
@rockenots 125 trillion would be a ridiculous increase over 1-2 trillion, even bigger than the 100 trillion that Sam Altman confirmed was bs.
I don't see the number of parameters in the blog post or the technical report.
@wadimiusz GPT-4 is a Transformer-style model  pre-trained to predict the next token in a document, using both publicly available data (such as internet data) and data licensed from third-party providers. The model was then fine-tuned using Reinforcement Learning from Human Feedback (RLHF) . Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.
Arguably this market should be taken down. I don't have a strong take on this though.
@wadimiusz I don't just mean "OpenAI didn't say therefore this market is ambiguous", i also mean something like "OpenAI doesn't want this public, and this market is a mechanism for making it public, so this market should be taken down because infohazards"