Will there be a more sample-efficient pretraining algorithm than next token prediction for NLP before 2027?
10
1kṀ623
2027
43%
chance

Will a pretraining algorithm for language models which meaningfully improves on the sample efficiency of next token prediction be widely known before 2027?

Some details:

  • The technique must involve self-supervised learning on unlabeled data

  • The technique must have documented scaling properties which meaningfully outperform next token prediction in test perplexity with respect to data, for whichever model architectures are popular by 2027

    • It's fine if there are tradeoffs with compute efficiency

    • It's fine if next token prediction outperforms the new technique early in training, or for small training runs, as long as scaling trends predict that the new technique would be better on runs using at least 10^26 FLOP and 15T tokens (roughly the budget of Llama 3 400B)

  • It must be accepted within the ML community that the technique is broadly superior to next token prediction (even if there are some tradeoffs) and has the potential to scale to outperform the best prior models trained using next token prediction

  • To validate the scaling potential of the method, it must be used to train a model which qualitatively matches or exceeds GPT-4 (if the above conditions hold before 2027, I will wait until July 2027 for such a model and will resolve YES if one is produced)

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy