Will a new lab create a top-performing AI frontier model before 2028?

Criteria for Resolution:

1. Definition of "New Lab":

- A "new lab" refers to any company, lab, or other entity that is not OpenAI, Anthropic, DeepMind, Google, Meta, Mixtral, xAI, Microsoft, Nvidia or any subsidiary or parent company of them.

2. Top-Performing Generally Capable AI Frontier Model:

- The AI frontier model must achieve no less than a robust second place by performance. This includes:

- Unambiguous first place.

- Unambiguous second place.

- Ambiguous first place.

- Sharing first place.

- Sharing second place does not qualify.

3. Performance Metrics:

- Performance will be judged based on the most well-accepted metrics and user opinions and approvals available by then.

- For example, metrics may include benchmarks such as MMLU, HumanEval, and other relevant AI performance benchmarks.

looking through the LMSYS leaderboard you might be missing at least: 01 AI, Alibaba, Cohere, Nvidia, Reka AI, Zhipu AI

Thanks. I will add Nvidia. With that, the list will remain fixed.

The compute cost of training a cutting edge model is in the hundreds of millions currently. Epoch estimates that it's going to continue to go up by 0.2 OOM each year.

That's without accounting for the human capital costs. Training a cutting edge model is going to require a bunch of engineering schlep, which means hiring some world class people.

You need to have both deep pockets and a strong motivation to start an AI lab for this to make sense. So maybe a national govenrment?

what about Microsoft?

Yes, I think it is reasonable to add Microsoft to the list.

Is this only about language models? Or do video generation models matter as well?

It must be considered a general purpose model with general capabilities. A video generation model can in principle be in this class. If there is a capable video generation model that can be applied for various tasks and it demonstrates strong intelligence capabilities, it will qualify. If, for example, it is just the best model in the category of the most aesthetically beautiful short videos generators or the best advertisement producers, it will not qualify.

IMO actioned conditioned video prediction models could be considered „general“ forward dynamics models/world models. But I see how your definition is more (text-based) task bound. Thanks for clarifying.

If it is a music production model does it count or has to be general purpose language or multimodal model?

It muse be a general purpose model.

foundational models suck at benchmarks, I thought? Like they need to be finetuned / RLHFed to reliably answer questions they are asked? not sure though. I haven't seen openai or anthropic give benchmark or access to their foundational model.

I mean of course fine-tuned versions of the models. By foundational I mean here generally capable large frontier models, not necessarily non-RLHFed.

Replaced with frontier, maybe less confusing.