Will more than 20 organizations publicly train large language models by 2024?
111
936
2K
resolved Jan 9
Resolved
NO
  • For this market a large language model is a language model trained using an amount of compute that is within an order of magnitude of the compute used to train the largest language model

  • It is not just based on parameter count.

  • I'll accept starting with an pretrained model and then doing additional training/finetuning as long as the amount of compute for the latter component is large enough.

By publically I just mean that it's well known that they trained the model. If, say, the Chinese government almost certainly has one but it's not definitely confirmed then that doesn't count.

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ982
2Ṁ958
3Ṁ476
4Ṁ184
5Ṁ145
Sort by:

Time for a 2025 market?

@MartinRandall I'm thinking about what similar markets I'm interested in, but I don't think I'll duplicate this one exactly.

I've reviewed Epoch AI's dataset + their methodology (https://docs.google.com/spreadsheets/d/1AAIebjNsnJj_uKALHbXNfn3_YsT6sHXtCU0q7OIPuc4/edit) and I'm satisfied with it. They have Gemini Ultra as the most compute intensive model at ~9e25 FLOP. With that as our estimate we have only 3 orgs in range: Google, OpenAI (GPT-4), and Inflection (Inflection 2). If we are generous and say Gemini Ultra was 1e25 FLOP the list expands to include Anthropic (Claude 2), HuggingFace (Falcon 180B), AliBaba (Qwen 72B), Microsoft and Nvidia (Megatron), Zhipu AI and Tsinghua KEG (ChatGLM3), and Baidu (ERNIE 3.0).

@vluzko do the count!

@nikki Yeah I'm working on it.

sold Ṁ50 of YES

I know this is a No, but can we get an example of any model that'd have counted for the market? Just want to give it a closure i suppose.

Would FLM101B have counted? It's got 100billion+ parameters. What about the Llamas? Zidong taichu (100billion)? Xverse (65bil)? Ik market doesn't resolve on parameter count only, but we do know a minimum of the dataset these were trained on to give a rough estimate of the compute

bought Ṁ2,500 of NO
predicted YES

“For this market a large language model is a language model trained using an amount of compute that is within an order of magnitude of the compute used to train the largest language model”

Is the “largest” adjective evaluated as of when the question was asked or as of resolution?

Seems like some comments imply the latter which doesn’t make much sense to me…

@Tyler31 it's resolution. What doesn't make sense?

predicted YES

@vluzko A. At time of asking seems like the standard/intuitive way to interpret a phrase like this, imo.

B. It seems like a more straightforward, informative question to have wanted to ask.

C. At resolution seems less practical to evaluate.

D. How do I know when you will resolve? What if on Jan 1 it’s unclear so you wait a bit to try to assess, but by that time then another model has been released…?

"larg

To the people betting YES: why? I do not bet on any of my AI markets, but if I was I would bet heavily on NO for this one. Most orgs training LLMs seem to be training at ~GPT-3 scale, which by most estimates is about 2 OOM less compute than GPT-4, and so it is likely that none of them will qualify. Do you think GPT-4 didn't actually use that much compute? Do you think that actually most orgs are training beyond GPT-3 scale? Do you think that even if that's not the case now that it will be in the next 6 months? (I am reasonably confident that any training runs that start now will not be finished in time for the market resolution, especially given the shortage of A100s, so 3 seems particularly unlikely)

predicted YES

@vluzko I thought this market ended on the end of 2024, not the 1st of January but I'll ride it to 0 since i'm already at -32%

@NiciusB truly not checking the exact resolution date is manifold's greatest villain

predicted NO

@vluzko close date doesn't always mean resolution date, there are exceptions. I think you should have specified in the description.

predicted YES

@ShadowyZephyr Technically “by 2024” does mean the 1st of Jan. I just didn’t think about it too much

@ShadowyZephyr ??? Close date and resolution date are assumed the same unless otherwise stated on... basically every market on Manifold. If you wanted explicit clarification you should have asked.

predicted NO

@vluzko some people do early close dates. it's rare, but it's always good to cover everything in the description

bought Ṁ207 of NO

@vluzko My best guess is that people are betting without reading the market definition of "large language models". It sure makes me nervous being the biggest NO holder here.

predicted NO

@vluzko Again, PLEASE do not resolve this market based on "estimates" of GPT-4's compute, to do so is invalidating both the title (which frankly was already useless) and even the description. You don't know anything about GPT-4's compute, you can only guess, it's a Fermi-level problem at best. I don't get what everyone's fixation on this is.

predicted NO

@NiciusB Actually, no, it depends on the market creator. "by x" is 50/50 between end of year x or beginning of year x

predicted YES

BLOOMChat is a new one that seems relevant (trained by SambaNova)

bought Ṁ0 of NO

@YoavTzfati BLOOMChat is a finetuned version of BLOOM and the training compute of the finetuning is likely very small.

predicted NO

@MaximeRiche BLOOM used the same compute than GPT3 and is thus more than 1 OOM below GPT4.

More related questions