For this market a large language model is a language model trained using an amount of compute that is within an order of magnitude of the compute used to train the largest language model
It is not just based on parameter count.
I'll accept starting with an pretrained model and then doing additional trainig/finetuning as long as the amount of compute for the latter component is large enough.
By publically I just mean that it's well known that they trained the model. If, say, the Chinese government almost certainly has one but it's not definitely confirmed then that doesn't count.

BLOOMChat is a new one that seems relevant (trained by SambaNova)

At this point I think it is likely that at market close the best estimates we have for at least some models (particularly GPT-4) will be based on scaling laws. Personally I am fine with this, I don't expect errors in those estimates to produce a difference that could shift the resolution. Here are some proposals:
- If the best estimates of compute (based on scaling laws, architectural details, details of the organization, etc) do not show the market is close, (either well below 20 or well above), I will resolve NO/YES respectively.
- If the best estimates show the market is close (say 17-23 orgs), I will resolve N/A.
- If we end up not even having benchmark data for many models (so we can't do scaling law estimates), I will either resolve N/A or propose new resolution criteria.
@vluzko Can you give examples for the models in my list, like, at least some of them?

@ShadowyZephyr not doing the calculations myself is part of the goal of the market. Predictors who take the time and effort to do the calculations in advance get mana (in expectation) by having better predictions, and in return I get to just use their calculations instead of calculating everything myself.
This paper gives some scaling laws and some FLOPs per token formulas, if you want a general idea of the procedure.
@vluzko Well, it doesn’t say that in the description. Dumping that on us after we bet seems unfair. If I knew I’d have to come up with my own estimates of compute, and THEN convince you that they are more accurate than others’, I would not have bet. And, again, this doesn’t at all help with complete black box models, which I expect will be more than just gpt-4 by the end of 2023.
I sold, and I wouldn’t recommend betting unless you want to get involved in the effort to estimate compute subjectively.
I expect LLM training compute expenditure to follow a power law, such that the largest LLM training expenditure at any time is 10x higher than the 20th.
It's not expenditure that's listed, it's compute. Which we can't actually figure out. (not that we could figure out expenditure anyway lmao) If the market creator isn't willing to give a reasonable definition that can actually be agreed on they ought to just resolve N/A
@ShadowyZephyr I think dollar cost is a reasonable way to measure amount of compute. Other metrics are available. I think any measure will show a power law, though perhaps it will be a larger multiplier measured by ops.
@MartinRandall No they aren't. GPT-4 is a complete black box. And estimates vary wildly. Estimates are not available for half of those. I really don't want to go back and check each company's papers for the 22 models that I listed, but suffice it to say that even 1 of them not having compute listed technically makes the resolution impossible. If you think you can come up with fair figures, please do tell me what they are.
And, dollar cost spent on the model doesn't really measure compute, GPUs used to train it are a small subset of the costs. There is R&D, paying people to sort out content, RLHF, people working on front end, and a lot more. If you look at OpenAI's job page, they have tons of jobs available, that don't have to do with training GPT-5 directly.
OpenAI (GPT-4)
Meta (OPT-175B, LLaMA-65b)
Anthropic (Claude-v1.3)
Eleuther (GPT-NeoX)
Adept (ACT-1)
Aleph Alpha (Luminous-supreme-control, luminous-world)
Cohere (command-xlarge-nightly)
Baidu (Ernie 3.0 Titan)
Forefront (pythia-20b)
LAION (OpenAssistant)
Yandex (YaLM-100b)
Amazon (AlexaTM)
Huawei (PanGu-Σ)
Cerebras (CerebrasGPT)
Technology Innovation Institute (Falcon)
Microsoft (GPT-4 Prometheus, Megatron)
Nvidia (Megatron)
DeepMind (Gopher)
Google (PaLM 2, Gemini)
Bloomberg (BloombergGPT)
AI21 (Jurassic-1)
Alibaba (Tongyi Qianwen)
Even if you don't count Microsoft because they didn't use enough compute (which I doubt), and you count DeepMind and Google as the same because Google owns DeepMind, we have 20. The compute on some of these is debatable, but even if you reject a couple, the chances of a few new organizations jumping in this year are MUCH greater than 15%. Also I maybe missed a couple.
As for future developments, Apple is rumored to be training one currently, so is the British government, and IBM might be. I wouldn't be surprised if AssemblyAI was too.
@ShadowyZephyr lot of these are not within 1 OOM of the compute used for GPT4. Maybe only Gemeni is in this range. So the count is 2, not 22.
Maybe 2-6 (instead of the 2 above)
Assuming GPT 4 training cost >100M)
@MaximeRiche And you can prove this how, exactly?
Many companies don’t just say “we used this much compute.” But it is almost certainly more than 2.
Also compute is not money. Costing 100M doesn’t necessarily mean a certain amount of compute, they also had to pay workers to RLHF it and align it.
If we cannot find out the compute to determine which are within 1 OoM (difficult because we know nothing about GPT-4) we should use the common sense definition of LLM. Which includes pretty much all of these.
Also, the market creator earlier included Eleuther, and Eleuther’s model is one of the smallest out of all of these.
If you're curious, here are the parameter counts for each company's best language model:
Tongyi Qianwen - 10 trillion
PanGu-Σ - 1.085 trillion
GPT-4 - Unknown, est. 1 trillion
Megatron - 1 trillion max
PaLM - 540 billion max
Gopher - 280 billion
Ernie 3.0 Titan - 260 billion
Luminous-world - 200 billion
Jurassic-1 - 178 billion
OPT - 175 billion
YaLM-100b - 100 billion
command-xlarge-nightly - 52 billion
Claude v1.3 - 52 billion
BloombergGPT - 50 billion
Falcon - 40 billion
GPT-NeoX - 20 billion
Pythia-20b - 20 billion
AlexaTM - 20 billion
CerebrasGPT - 13 billion
OpenAssistant - 12 billion
mpt-7b - 7 billion (excluded from my original list because they publicly said it only cost 200k)
ACT-1 - Unknown
Parameters does not mean training cost though. I doubt Tongyi Qianwen cost 10x more than all the other models. In fact, Alibaba's last 10 trillion parameter model M6, used more than an order of magnitude less than GPT-3.
@ShadowyZephyr I can't prove and don't have inside information.on the compute used for GPT4.
I would be surprised if GPT4 is not using >X30 more compute than GPT3. But not data is available on that.
@MaximeRiche So where did you get 2-6 from, if you can't prove it? All these companies refer to their models as "Large Language Models".
@ShadowyZephyr The question is about how many orgs have done a training within one OOM of the largest LM. If largest is about compute and GPT4 or Gemini counts, then a guess is 2-6 by end of year. If largest is about the number of parameters of the largest model ever trained, counting sparse models... Then the count could be pretty large and above 20.
@MaximeRiche your guess is just speculation - not all companies say the amount of compute they used. All the big players in the AI race probably won't, and I think it's fair to resolve to whatever is referred to as a "Large Language Model" which all of these are.

@ShadowyZephyr I gave a definition of an LLM in the market description, I am not going to modify it. Changing resolution criteria partway through is a bad idea, and I'm not interested in how many organizations declare themselves to have LLMs anyway. If you are interested in that question you should make another market.

@vluzko Okay well how many of the ones I’ve listed do you think count? There's no way to make an objective resolution based on compute, because data isn't published for like half of these.
So, you need to resolve N/A if you refuse to change the criteria.

@BenjaminCosman Certainly Brain, DeepMind, OpenAI. I think also Meta, Anthropic, Eleuther, Adept but less sure about exact compute costs for them. So let's say 10. And yes I picked a very high number deliberately because I'm curious about a scenario where every large corp starts building/having their own LLMs












Will more than 20 organizations publicly train large language models by 2024?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition