1
Will more than 20 organizations publicly train large language models by 2024?
45
closes 2024
47%
chance
  • For this market a large language model is a language model trained using an amount of compute that is within an order of magnitude of the compute used to train the largest language model

  • It is not just based on parameter count.

  • I'll accept starting with an pretrained model and then doing additional trainig/finetuning as long as the amount of compute for the latter component is large enough.

By publically I just mean that it's well known that they trained the model. If, say, the Chinese government almost certainly has one but it's not definitely confirmed then that doesn't count.

Sort by:
YoavTzfati avatar
Yoav Tzfatiis predicting YES at 36%

BLOOMChat is a new one that seems relevant (trained by SambaNova)

vluzko avatar
Vincent Luczkow

At this point I think it is likely that at market close the best estimates we have for at least some models (particularly GPT-4) will be based on scaling laws. Personally I am fine with this, I don't expect errors in those estimates to produce a difference that could shift the resolution. Here are some proposals:
- If the best estimates of compute (based on scaling laws, architectural details, details of the organization, etc) do not show the market is close, (either well below 20 or well above), I will resolve NO/YES respectively.
- If the best estimates show the market is close (say 17-23 orgs), I will resolve N/A.
- If we end up not even having benchmark data for many models (so we can't do scaling law estimates), I will either resolve N/A or propose new resolution criteria.

ShadowyZephyr avatar
ShadowyZephyris predicting YES at 32%

@vluzko Can you give examples for the models in my list, like, at least some of them?

vluzko avatar
Vincent Luczkow

@ShadowyZephyr not doing the calculations myself is part of the goal of the market. Predictors who take the time and effort to do the calculations in advance get mana (in expectation) by having better predictions, and in return I get to just use their calculations instead of calculating everything myself.
This paper gives some scaling laws and some FLOPs per token formulas, if you want a general idea of the procedure.

ShadowyZephyr avatar
ShadowyZephyr (edited)

@vluzko Well, it doesn’t say that in the description. Dumping that on us after we bet seems unfair. If I knew I’d have to come up with my own estimates of compute, and THEN convince you that they are more accurate than others’, I would not have bet. And, again, this doesn’t at all help with complete black box models, which I expect will be more than just gpt-4 by the end of 2023.

I sold, and I wouldn’t recommend betting unless you want to get involved in the effort to estimate compute subjectively.

MartinRandall avatar
Martin Randallbought Ṁ100 of NO

I expect LLM training compute expenditure to follow a power law, such that the largest LLM training expenditure at any time is 10x higher than the 20th.

ShadowyZephyr avatar
ShadowyZephyris predicting YES at 31% (edited)

@MartinRandall

It's not expenditure that's listed, it's compute. Which we can't actually figure out. (not that we could figure out expenditure anyway lmao) If the market creator isn't willing to give a reasonable definition that can actually be agreed on they ought to just resolve N/A

MartinRandall avatar
Martin Randallsold Ṁ56 of NO

@ShadowyZephyr I think dollar cost is a reasonable way to measure amount of compute. Other metrics are available. I think any measure will show a power law, though perhaps it will be a larger multiplier measured by ops.

ShadowyZephyr avatar
ShadowyZephyris predicting YES at 31% (edited)

@MartinRandall No they aren't. GPT-4 is a complete black box. And estimates vary wildly. Estimates are not available for half of those. I really don't want to go back and check each company's papers for the 22 models that I listed, but suffice it to say that even 1 of them not having compute listed technically makes the resolution impossible. If you think you can come up with fair figures, please do tell me what they are.

And, dollar cost spent on the model doesn't really measure compute, GPUs used to train it are a small subset of the costs. There is R&D, paying people to sort out content, RLHF, people working on front end, and a lot more. If you look at OpenAI's job page, they have tons of jobs available, that don't have to do with training GPT-5 directly.

ShadowyZephyr avatar
ShadowyZephyr (edited)
  1. OpenAI (GPT-4)

  2. Meta (OPT-175B, LLaMA-65b)

  3. Anthropic (Claude-v1.3)

  4. Eleuther (GPT-NeoX)

  5. Adept (ACT-1)

  6. Aleph Alpha (Luminous-supreme-control, luminous-world)

  7. Cohere (command-xlarge-nightly)

  8. Baidu (Ernie 3.0 Titan)

  9. Forefront (pythia-20b)

  10. LAION (OpenAssistant)

  11. Yandex (YaLM-100b)

  12. Amazon (AlexaTM)

  13. Huawei (PanGu-Σ)

  14. Cerebras (CerebrasGPT)

  15. Technology Innovation Institute (Falcon)

  16. Microsoft (GPT-4 Prometheus, Megatron)

  17. Nvidia (Megatron)

  18. DeepMind (Gopher)

  19. Google (PaLM 2, Gemini)

  20. Bloomberg (BloombergGPT)

  21. AI21 (Jurassic-1)

  22. Alibaba (Tongyi Qianwen)

    Even if you don't count Microsoft because they didn't use enough compute (which I doubt), and you count DeepMind and Google as the same because Google owns DeepMind, we have 20. The compute on some of these is debatable, but even if you reject a couple, the chances of a few new organizations jumping in this year are MUCH greater than 15%. Also I maybe missed a couple.
    As for future developments, Apple is rumored to be training one currently, so is the British government, and IBM might be. I wouldn't be surprised if AssemblyAI was too.

MaximeRiche avatar
Maxime Richéis predicting NO at 80%

@ShadowyZephyr lot of these are not within 1 OOM of the compute used for GPT4. Maybe only Gemeni is in this range. So the count is 2, not 22.

MaximeRiche avatar
Maxime Richéis predicting NO at 80%

Maybe 2-6 (instead of the 2 above)

Assuming GPT 4 training cost >100M)

ShadowyZephyr avatar
ShadowyZephyris predicting YES at 42% (edited)

@MaximeRiche And you can prove this how, exactly?

Many companies don’t just say “we used this much compute.” But it is almost certainly more than 2.

Also compute is not money. Costing 100M doesn’t necessarily mean a certain amount of compute, they also had to pay workers to RLHF it and align it.

If we cannot find out the compute to determine which are within 1 OoM (difficult because we know nothing about GPT-4) we should use the common sense definition of LLM. Which includes pretty much all of these.

Also, the market creator earlier included Eleuther, and Eleuther’s model is one of the smallest out of all of these.

ShadowyZephyr avatar
ShadowyZephyris predicting YES at 36% (edited)

If you're curious, here are the parameter counts for each company's best language model:
Tongyi Qianwen - 10 trillion
PanGu-Σ - 1.085 trillion
GPT-4 - Unknown, est. 1 trillion
Megatron - 1 trillion max
PaLM - 540 billion max
Gopher - 280 billion
Ernie 3.0 Titan - 260 billion
Luminous-world - 200 billion
Jurassic-1 - 178 billion

OPT - 175 billion

YaLM-100b - 100 billion
command-xlarge-nightly - 52 billion

Claude v1.3 - 52 billion
BloombergGPT - 50 billion
Falcon - 40 billion
GPT-NeoX - 20 billion
Pythia-20b - 20 billion
AlexaTM - 20 billion
CerebrasGPT - 13 billion
OpenAssistant - 12 billion
mpt-7b - 7 billion (excluded from my original list because they publicly said it only cost 200k)
ACT-1 - Unknown

Parameters does not mean training cost though. I doubt Tongyi Qianwen cost 10x more than all the other models. In fact, Alibaba's last 10 trillion parameter model M6, used more than an order of magnitude less than GPT-3.

MaximeRiche avatar
Maxime Richéis predicting NO at 36%

@ShadowyZephyr I can't prove and don't have inside information.on the compute used for GPT4.

I would be surprised if GPT4 is not using >X30 more compute than GPT3. But not data is available on that.

ShadowyZephyr avatar
ShadowyZephyris predicting YES at 36%

@MaximeRiche So where did you get 2-6 from, if you can't prove it? All these companies refer to their models as "Large Language Models".

MaximeRiche avatar
Maxime Richéis predicting NO at 36%

@ShadowyZephyr The question is about how many orgs have done a training within one OOM of the largest LM. If largest is about compute and GPT4 or Gemini counts, then a guess is 2-6 by end of year. If largest is about the number of parameters of the largest model ever trained, counting sparse models... Then the count could be pretty large and above 20.

ShadowyZephyr avatar
ShadowyZephyris predicting YES at 36% (edited)

@MaximeRiche your guess is just speculation - not all companies say the amount of compute they used. All the big players in the AI race probably won't, and I think it's fair to resolve to whatever is referred to as a "Large Language Model" which all of these are.

vluzko avatar
Vincent Luczkow

@ShadowyZephyr I gave a definition of an LLM in the market description, I am not going to modify it. Changing resolution criteria partway through is a bad idea, and I'm not interested in how many organizations declare themselves to have LLMs anyway. If you are interested in that question you should make another market.

vluzko avatar
Vincent Luczkow

@MaximeRiche It's not based on parameter count, that's in the market description.

ShadowyZephyr avatar
ShadowyZephyris predicting YES at 31% (edited)

@vluzko Okay well how many of the ones I’ve listed do you think count? There's no way to make an objective resolution based on compute, because data isn't published for like half of these.

So, you need to resolve N/A if you refuse to change the criteria.

BenjaminCosman avatar
Benjamin Cosmanbought Ṁ50 of NO

20 seems like a lot - how many have done so thus far?

vluzko avatar
Vincent Luczkow

@BenjaminCosman Certainly Brain, DeepMind, OpenAI. I think also Meta, Anthropic, Eleuther, Adept but less sure about exact compute costs for them. So let's say 10. And yes I picked a very high number deliberately because I'm curious about a scenario where every large corp starts building/having their own LLMs

ManifoldDream avatar

Will more than 20 organizations publicly train large language models by 2024?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition

Related markets

By 2025 will there be a competitive large language model with >50% of the total training data generated from a large language model?78%
By the end of 2026, will we have transparency into any useful internal pattern within a Large Language Model whose semantics would have been unfamiliar to AI and cognitive science in 2006?50%
In 2030, will there be more than 10 $5bn companies that are some form of large language model focused on a specific task. ie not Microsoft, not OpenAI,68%
Will a large language models beat a super grandmaster playing chess by 2028?24%
Will another organization surpass OpenAI in the public sphere of awareness of AI progress by the end of 2024?29%
Will there be an AI language model that surpasses ChatGPT and other OpenAI models before the end of 2024?60%
Natural Language Robot by 2030?83%
Natural Language Robot by 2030?66%
Most popular language model from OpenAI competitor by 2026?40%
Will Transformer based architectures still be SOTA for language modelling by 2026?66%
By the start of 2026, will I still think that transformers are the main architecture for tasks related to natural language processing?64%
Will an AI have >20 million Twitter followers by the 2025?24%
Will speech dominate 2024 machine learning?26%
Will a news article containing the string "love in the time of large language models" be published in 202323%
Will I use AI-generated text in a professional context by the end of 2024?82%
Will natural language based proof assistants be in common use by 2026?36%
AI: Will someone train a $100M model by 2024?66%
Will SOTA for RL on Atari-57 include a large pretrained language/image/video model by 2024?61%
Will Google Search include a chatbot at end of June 2023?42%
AI: Will someone train a $1B model by 2024?14%