The final model does not have to cost 100k. If a model outperforms GPT-3 before 100k has been spent on training the market resolves yes, even if the model continues to be trained after that point.
Clarification: all benchmarks in the original GPT-3 paper.
@meefburger Yes. I will make exceptions for any benchmarks that are/become unavailable, or are otherwise very difficult to access (e.g. very onerous licensing). I may consider making an exception for a model that completely blows GPT-3 out of the water but skips some minor benchmarks. But since the market only resolves yes in the case where the model is quite cheap to use, it seems likely that actually testing it against all the benchmarks will be feasible.
I think it may be already possible. GPT-3 used 3e23 FLOPS. GPT-30B by MosaicML used 3 times less, 1e23 FLOPS and cost 450k$. Flan-T5-XXL used 3 times less, 3.3e22 FLOPS, so naively it should cost around 150k$ but probably <100k$ because Google has access to cheaper hardware.
Does Flan-T5-XXL outperform GPT-3 on all benchmarks? I don't know. This is not even a reasonable definition, you can make a benchmark which will specifically prefer GPT-3 to all current models.
But it is significantly better on MMLU 5-shot (55 vs 44) which is a strong signal that it might actually be generally better.
I would give 95% that a model which is reasonably better than GPT-3 will be trained for <100k$ by 2024 and maybe 85% that this market will be resolved as yes (the model can be not public, it may be hard to estimate cost, it may be hard to say it's clearly better than GPT-3).
@ValeryCherepanov It's specifically all benchmarks in the original GPT-3 paper, not "all benchmarks imaginable"
Stable diffusion trained for $600k, arguably ~$200k @ aggressive spot pricing.
Big labs continue to be terrible at training efficiency (e.g. one paper beat AlphaGo with ~50x less compute from better sampling and architecture), with stability.ai in play ***AND** their open-source approach, someone might pull this off