By 2025 will there be a competitive large language model with >50% of the total training data generated from a large language model?
closes 2025

Large meaning >= 20b parameters.
Competitive meaning the benchmark results are close, or better, than a model trained with only human text.

Computer generated text does not count, it has to be the output of a language model. For example, converting code to an intermediate representation of a compiler and training only on that would not count.

Processed text is valid, as long as it's sourced from the language model.

Multiple stages of training are fine, so if there's a training period on only human text, as long as the total AI training examples are > 50% of the total training examples over the entire training this will resolve yes.

Market resolves on March 1st 2025 to account for announcement of models trained in the last half of 2024.

Get Ṁ500 play money

Related questions

Will OpenAI hint at or claim to have AGI by 2025 end? (8000M subsidy)
firstuserhere avatarfirstuserhere
32% chance
By the end of 2026, will we have transparency into any useful internal pattern within a Large Language Model whose semantics would have been unfamiliar to AI and cognitive science in 2006?
Will Tyler Cowen agree that an 'actual mathematical model' for AI X-risk has been developed by December 31st, 2023?
Will AI agents be used to develop software commercially by the end of 2023?
AlexMizrahi avatarAlex Mizrahi
70% chance
Will there be an AI language model that surpasses ChatGPT and other OpenAI models before the end of 2024?
Gen avatarGenzy
58% chance
Will there be another major public-facing breakthrough in AI before March 31, 2024 [subjective - 1000M subsidy added]
DylanSlagh avatarDylan Slagh
53% chance
Will Tyler Cowen agree that an 'actual mathematical model' for AI X-Risk has been developed by October 15, 2023?
JoeBrenton avatarJoe Brenton
9% chance
Will there be an open source, uncensored AI image generator with the same or greater quality as DALLE-3 by end of 2025?
DAL59 avatarDAL59
84% chance
Will it be public knowledge by EOY 2025 that a major AI lab believed to have created AGI internally before October 2023?
dmayhem93 avatardmayhem93
15% chance
Diffusion hypernetworks: Will a network similar to DALLE be able to generate "network weights" instead of images in 2023
AranCarmon avatarAran Carmon
24% chance
Will openAI release an official android asisstant app before March 2024?
AranCarmon avatarAran Carmon
19% chance
Will a 10B parameter multimodal RL model be trained by Deepmind in the next 12 months?
BionicD0LPH1N avatarBionic
66% chance
Will more than 20 organizations publicly train large language models by 2024?
Will Science's Top Breakthrough of the Year in 2023 be AI-related?
dp avatardp
34% chance
Short Term AI 2.5: By January 2024, will there be a usable, general AI assistant?
vluzko avatarVincent Luczkow
54% chance
Will OpenAI hint at or claim to have AGI by Jan 1, 2030? (1000M Subsidy)
firstuserhere avatarfirstuserhere
76% chance
Will OpenAI release a search engine before 2024? [Read description]
FranklinBaldo avatarFranklin Baldo
25% chance
Will Elon’s new startup xAI launch a publically accessible AI model (not waitlist) before the end of 2023?
Soli avatarSoli
15% chance
Will there have been a noticeable sector-wide economic effect from a new AI technology by the end of 2023?
Nostradamnedus avatarNostradamnedus
15% chance
Will artificial superintelligence exist by 2030? [resolves N/A in 2027]
Sort by:
MartinRandall avatar
Martin Randallbought Ṁ40 of YES

A model extraction attack is enough to resolve this yes, right? Or any kind of distillation process where we train a model and use its output to train a model?

Does Constitutional AI count?

1 reply
dmayhem93 avatar
dmayhem93predicts YES

@MartinRandall It would have to be on text, not on logits, so like alpaca and friends are fine if they scaled it up to 750b tokens, but a traditional student/teacher setup is not.

Constitutional AI would count yeah, if it was >50% of the total.