By 2025 end, will it be generally agreed upon that LLM produced text/code > human text/code for training LLMs?

1kṀ4792

Jan 1

chance

ALL

By 2025 end , will it be generally agreed upon that LLM produced text/code > human text/code for training LLMs?

Quality.

Technology

Technical AI Timelines

AI Impacts

AI risk

Get

1,000

to start trading!

People are also trading

By 2029 end, will it be generally agreed upon that LLM produced text/code > human text/code for training LLMs?

77% chance

By 2027, will it be generally agreed upon that LLM produced text > human text for training LLMs?

62% chance

Will LLMs be better than typical white-collar workers on all computer tasks before 2026?

4% chance

Will LLMs be able to formally verify non-trivial programs by the end of 2025?

12% chance

Will LLMs become a ubiquitous part of everyday life by June 2026?

86% chance

Will LLMs' loss function achieve the level of entropy of human text by the end of 2030?

61% chance

Will there be any simple text-based task that most humans can solve, but top LLMs can't? By the end of 2026

59% chance

Will any widely used LLM be pre-trained with abstract synthetic data before 2030?

72% chance

Will LLMs be used for academic peer review by 2030?

71% chance

Will LLM based systems have debugging ability comparable to a human by 2030?

Sort by:

I am doubtful

https://arxiv.org/abs/2305.17493

The Curse of Recursion: Training on Generated Data Makes Models Forget

Stable Diffusion revolutionised image creation from descriptive text. GPT-2, GPT-3(.5) and GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT introduced such language models to the general public. It is now clear that large language models (LLMs) are here to stay, and will bring about drastic change in the whole ecosystem of online text and images. In this paper we consider what the future might hold. What will happen to GPT-{n} once LLMs contribute much of the language found online? We find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear. We refer to this effect as Model Collapse and show that it can occur in Variational Autoencoders, Gaussian Mixture Models and LLMs. We build theoretical intuition behind the phenomenon and portray its ubiquity amongst all learned generative models. We demonstrate that it has to be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web. Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of content generated by LLMs in data crawled from the Internet.

predictedNO

https://arxiv.org/abs/2305.15717 perhaps relevant

The False Promise of Imitating Proprietary LLMs

An emerging method to cheaply improve a weaker language model is to finetune it on outputs from a stronger model, such as a proprietary system like ChatGPT (e.g., Alpaca, Self-Instruct, and others). This approach looks to cheaply imitate the proprietary model's capabilities using a weaker open-source model. In this work, we critically analyze this approach. We first finetune a series of LMs that imitate ChatGPT using varying base model sizes (1.5B--13B), data sources, and imitation data amounts (0.3M--150M tokens). We then evaluate the models using crowd raters and canonical NLP benchmarks. Initially, we were surprised by the output quality of our imitation models -- they appear far better at following instructions, and crowd workers rate their outputs as competitive with ChatGPT. However, when conducting more targeted automatic evaluations, we find that imitation models close little to none of the gap from the base LM to ChatGPT on tasks that are not heavily supported in the imitation data. We show that these performance discrepancies may slip past human raters because imitation models are adept at mimicking ChatGPT's style but not its factuality. Overall, we conclude that model imitation is a false promise: there exists a substantial capabilities gap between open and closed LMs that, with current methods, can only be bridged using an unwieldy amount of imitation data or by using more capable base LMs. In turn, we argue that the highest leverage action for improving open-source models is to tackle the difficult challenge of developing better base LMs, rather than taking the shortcut of imitating proprietary systems.

I just sold my NO position since I realised I need more clarification. I could imagine two ends of the spectrum of what would make this resolve YES:

LLMs are, in future, trained purely on the output of previous generation LLMs, applied recursively. This means that only some distant ancestor saw any raw human data. This approach is found to be superior (via some benchmarks) to training from human data. (I would bet NO on this.)
LLMs are used to sanitise/summarise/filter etc training data in future as a kind of preprocessing pipe before being used to train the next generation LLM. (I would bet YES in this case, depending on the exact wording)

In fact you could water down the second definition further by sanitising only some of the data, which could even be a minority.

Please can you clarify which of these definitions is what you had in mind, or provide another? Thanks!

@Tomoffer The second thing could happen. The first thing would not improve the data, but eventually would make it completely meaningless (by completely detaching the text from contact with reality.)