Will it cost less than 100k USD to train and run a language model that outperforms GPT-3 175B on all benchmarks by the end 2024?

1.2kṀ6104

resolved Apr 5

Resolved

YES

ALL

The final model does not have to cost 100k. If a model outperforms GPT-3 before 100k has been spent on training the market resolves yes, even if the model continues to be trained after that point.

Clarification: all benchmarks in the original GPT-3 paper.

Market context

Technical AI Timelines

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ154
2		Ṁ139
3		Ṁ128
4		Ṁ71
5		Ṁ43

People are also trading

By January 2026, will we have a language model with similar performance to GPT-3.5 (i.e. ChatGPT as of Feb-23) that is small enough to run locally on the highest end iPhone available at the time?

96% chance

Will a GPT-4 level efficient HRM based language model be released before Feb 2026? [Details in description]

5% chance

Will it cost $30 to train a GPT-3 level model in 2030?

31% chance

Will a GPT-3 quality model be trained for under $10.000 by 2030?

98% chance

Will a GPT-3 quality model be trained for under $1,000 by 2030?

87% chance

Will a GPT-4 quality model be trained for under $10.000 by 2030?

86% chance

Will $10,000 worth of AI hardware be able to train a GPT-3 equivalent model in under 1 hour, by EOY 2027?

16% chance

Will a language model that runs locally on a consumer cellphone beat GPT4 by EOY 2026?

82% chance

Will it be possible to disentangle most of the features learned by a model comparable to GPT-3 this decade? (1k subsidy)

58% chance

Before 2028, will anyone train a GPT-4-level model in a minute?

Sort by:

I think the answer to the question is very likely yes, but I'm not sure if there's a specific model out there that resolves this to YES. Does anyone have suggestions?
Llama2 7B comes out at just about ~100k (184,320 A100 hours, and I'm pretty sure you can get A100s for $0.50/hour if you're buying them and amortizing the cost over their lifespan). Any objections to that estimate?
I also specified in the comments that it had to run on every GPT-3 benchmark (because I didn't want to litigate which benchmarks count, which in hindsight was a poor choice) - does anyone know if those experiments have been run? If not I will attempt to run them myself, but that might take a while.

@vluzko I've checked all of the benchmarks I can and Llama2 consistently very slightly beats GPT-3. I of course do not have published numbers on the cost of training, but I think the estimate up there is reasonable. Does anyone have an argument to not resolve this YES?

MPT cost 200k$ and is roughly on par with GPT-3

Is it necessary for the new model to have been tested on all the benchmarks published for the 175B model in the original GPT-3 paper for this to resolve YES?

@meefburger Yes. I will make exceptions for any benchmarks that are/become unavailable, or are otherwise very difficult to access (e.g. very onerous licensing). I may consider making an exception for a model that completely blows GPT-3 out of the water but skips some minor benchmarks. But since the market only resolves yes in the case where the model is quite cheap to use, it seems likely that actually testing it against all the benchmarks will be feasible.

100k nominal or inflation adjusted? If the latter, adjust from what starting point?

@TomCohen Nominal

@vluzko Sweet, thanks for clarifying!

predictedYES

I think it may be already possible. GPT-3 used 3e23 FLOPS. GPT-30B by MosaicML used 3 times less, 1e23 FLOPS and cost 450k$. Flan-T5-XXL used 3 times less, 3.3e22 FLOPS, so naively it should cost around 150k$ but probably <100k$ because Google has access to cheaper hardware.

Does Flan-T5-XXL outperform GPT-3 on all benchmarks? I don't know. This is not even a reasonable definition, you can make a benchmark which will specifically prefer GPT-3 to all current models.

But it is significantly better on MMLU 5-shot (55 vs 44) which is a strong signal that it might actually be generally better.

I would give 95% that a model which is reasonably better than GPT-3 will be trained for <100k$ by 2024 and maybe 85% that this market will be resolved as yes (the model can be not public, it may be hard to estimate cost, it may be hard to say it's clearly better than GPT-3).

@ValeryCherepanov It's specifically all benchmarks in the original GPT-3 paper, not "all benchmarks imaginable"

MosaicML trained GPT-3 quality LLM for 450k about 1 month ago.

Stable diffusion trained for $600k, arguably ~$200k @ aggressive spot pricing.

https://twitter.com/jackclarkSF/status/1563957173062758401

Big labs continue to be terrible at training efficiency (e.g. one paper beat AlphaGo with ~50x less compute from better sampling and architecture), with stability.ai in play ***AND** their open-source approach, someone might pull this off

Cost ~$10m to train in 2020. Costs ~$1m to train 4.5 yrs later (halves per 18mos) Leaves ~10x improvement in approach to tie it, much more to exceed across the board. Note that costs are still $10mm today to train Palm/Megatron/Chinchilla, with no evidence of training (rather than inference) efficiency gains.