The final model does not have to cost 100k. If a model outperforms GPT-3 before 100k has been spent on training the market resolves yes, even if the model continues to be trained after that point.
Clarification: all benchmarks in the original GPT-3 paper.
Is it necessary for the new model to have been tested on all the benchmarks published for the 175B model in the original GPT-3 paper for this to resolve YES?
@meefburger Yes. I will make exceptions for any benchmarks that are/become unavailable, or are otherwise very difficult to access (e.g. very onerous licensing). I may consider making an exception for a model that completely blows GPT-3 out of the water but skips some minor benchmarks. But since the market only resolves yes in the case where the model is quite cheap to use, it seems likely that actually testing it against all the benchmarks will be feasible.
100k nominal or inflation adjusted? If the latter, adjust from what starting point?
I think it may be already possible. GPT-3 used 3e23 FLOPS. GPT-30B by MosaicML used 3 times less, 1e23 FLOPS and cost 450k$. Flan-T5-XXL used 3 times less, 3.3e22 FLOPS, so naively it should cost around 150k$ but probably <100k$ because Google has access to cheaper hardware.
Does Flan-T5-XXL outperform GPT-3 on all benchmarks? I don't know. This is not even a reasonable definition, you can make a benchmark which will specifically prefer GPT-3 to all current models.
But it is significantly better on MMLU 5-shot (55 vs 44) which is a strong signal that it might actually be generally better.
I would give 95% that a model which is reasonably better than GPT-3 will be trained for <100k$ by 2024 and maybe 85% that this market will be resolved as yes (the model can be not public, it may be hard to estimate cost, it may be hard to say it's clearly better than GPT-3).
Stable diffusion trained for $600k, arguably ~$200k @ aggressive spot pricing.