Will there be a period of 12 contiguous months during which no new compute-SOTA LM is released, by Jan 1, 2033?

1.2kṀ5952

2032

71%

chance

ALL

This resolves YES if: There is a contiguous period of 12 months during which no new language model is credibly known to exist that sets a new record for most compute used during the entire training process from start to finish.

The definition of LM is intended to include models that use other modalities or do other things in addition to language (images, RL, etc).
This specifically does not take into account algorithmic innovations. A 10x effective compute improvement from better algorithms/utilization does not count as a 10x compute usage increase. This includes low level optimizations, or innovations that use lower precision (i.e I consider 2 FP16 FLOPs as equivalent to 1 FP32 FLOP)
This market is conditional on it being generally understood that SOTA LMs are still being publicized, and their compute usages are at least roughly estimatable (except for i.e military). Compute usages don't have to be exact or official as long as they are credibly estimatable from public information (i.e looking at power consumption, financial reports, satellite imagery of datacenters). This market resolves N/A if compute numbers stop being estimatable in such a way that it becomes controversial as to whether models are continually using more compute.
A fine tune of an existing LM counts the compute usage of the base LM plus the fine tuning compute; however, to qualify for this market it has to use at least 50% new compute over the last LM that qualified for this market; this is intended to exclude a SOTA LM being continually fine tuned on new data with trivial amounts of compute from technically continually setting new SOTAs on this market.
As a sanity check, the new LM should not be substantially worse than previous compute-SOTA models on most major benchmarks where the models are fairly comparable. This is intended to exclude models which are trained with much more inefficient techniques/poorly chosen hparams that waste lots of the compute.

LLMs

Get

1,000

to start trading!

People are also trading

By 2026, will it be standard practice to sandbox SOTA LLMs?

28% chance

MMLU 99% #3: Will SOTA for MMLU (average) pass 99% by the start of 2026?

6% chance

What will be true of the SOTA AI on the FrontierMath benchmark, before 2026?

2025 SOTA LLM releases cause Kokotajlo to increase AGI timeline estimate?

75% chance

Will leap seconds still exist on 31 December 2050?

Will the transformer architecture be replaced in SOTA LLMs by 2028?

61% chance

MMLU 99% #4: Will SOTA for MMLU (average) pass 99% by the start of 2027?

8% chance

What will be true of the SOTA AI on the FrontierMath benchmark, before 2027?

What will be true of the SOTA AI on the FrontierMath benchmark, before 2028?

MMLU 99% #5: Will SOTA for MMLU (average) pass 99% by the start of 2028?

Sort by:

predictedNO

Using a point-estimate for Minerva, Metaculus has ~80% that GPT-4 used more compute than Minerva https://www.metaculus.com/questions/13787/petaflops-during-gpt-4-training/

predictedNO

@JacobPfau Is there an estimate with more specific details on how the specific number was arrived at? I think as presented this is insufficient to meet the "credibly estimatable" bar. (I will probably keep the market open for a few months for estimates to trickle in; i.e I will not resolve N/A immediately after 12 months have passed since PaLM)

predictedYES

@LeoGao even if there were credible estimates, without knowing the training data we do not know if the models were trained on the benchmarks, making it impossible to validate the last bullet point

by 2033? lol definitely, we'll have hard ASI by then, we'll have actually hit the wall

Is the current SOTA for the purposes of this question PaLM or GPT-4? It seems possible GPT-4 used more compute than PaLM, but OpenAI isn’t announcing anything about it.

More specifically, if no more models or info about models are released between today and PaLM’s one year anniversary, will this resolve to “yes”?

Since OpenAI hasn't released any details on training, in two weeks this should resolve yes due to PaLM

predictedNO

@dmayhem93 The intent of this market is not to hinge on the particular technicality of whether exact/official numbers are released. In the event that the resolution hinges on the nonexistence of public estimates, this market is to resolve N/A, not YES.

This market resolves N/A if compute numbers stop being estimatable in such a way that it becomes controversial as to whether models are continually using more compute.

@LeoGao FYI, I interpreted this as being an assertion about industry tends as a whole and not a single model.

@StellaBiderman I'm with Stella as well, but my argument hinges on a 3000 mana payout

I think we'll have basically hit bottom by then. we're not on an endless exponential; technological sigmoids eventually end. since we'll most likely have hard asi by then, further efficiency movement after that point seems unlikely to me.

@L Also worth clarifying that this market is conditional on the world not having ended or transitioned into a completely unrecognizable galaxy brain post-scarcity waterfowl based society by then. Plus this doesn't specify that it's an exponential, even if models only get 10% more compute-hungry each year, it still counts as long as people keep releasing them frequently.

predictedNO

(The 50% provision only applies to models that are finetunes of previous compute-SOTA models; if you train another new model from scratch that ends up being 10% more compute hungry than the last SOTA, it still counts)

predictedNO

(Also, might be worth clarifying what I mean by conditional on the world not having ended: I mean specifically looking only at the time period before the world having ended; so if there is a new SOTA every 5 months until the world ends in 2027 then this would resolve NO rather than N/A. However, it's not clear this matters because nobody will be around to resolve it either way if this does happen...)