Will scaling current methods be enough to eliminate LLM hallucination?
34
552
670
2027
26%
chance

Current large language models (LLMs) are capable in many rich language tasks, but remain prone to a failure mode known as hallucination: models confidently output false "facts" as though they were true.

One view might be that this is an inherent feature of LLMs based on the current architecture of next-token prediction, since the model has no (direct) notion of overconfidence. Such a view might not expect scaling current models to significantly reduce hallucination.

Another perspective might expect larger models to develop emergent capabilities which reduce or eliminate hallucination. If a knowledgeable human would respond to a hallucination-inducing question with something like "the question does not make sense", we might expect a capable language model to learn similar patterns.

This market resolves to TRUE if I am shown:

  • Compelling evidence of robustly low levels of hallucination....

  • From a model which does not introduce any as-yet unused techniques for eliminating hallucination...

  • Before the market closes on January 1, 2027.

I'll define "robustly low levels of hallucination" as "hallucinates false facts in fewer than <0.1% of responses for a set of difficult questions", or comparable levels of evidence. I'll define "as-yet unused techniques" as techniques which are not currently used by any major LLM model. Solutions such as "do far more RLHF specifically on hallucination-inducing inputs" would not count as a new technique.

Market resolves to FALSE if no such evidence is produced by the market close date, OR if a compelling proof of the infeasibility of solving hallucination through scale alone is given. Such a proof must have reasonably wide acceptance in the machine learning community.

Apr 2, 9:13pm: Will scale alone be enough to eliminate LLM hallucination? → Will scaling current methods be enough to eliminate LLM hallucination?

Get Ṁ200 play money
Sort by:
predicts YES

How does this resolve if there are robustly low levels of hallucination for answers a domain expert could get with 3 minutes of research, but not for answers a domain expert could get with 20 minutes of research?

predicts YES

@NoaNabeshima When I wrote “difficult questions” I had in mind the kind of question considered difficult for language models at time of writing. I think that would be much closer to 3 minutes of research than 20.

The title question to me is a bit misleading because the resolution criteria are really asking if scale plus filtering plus five tuning plus constitutional AI plus prompts plus multiple other known techniques are enough. Which isn't "scale alone".

predicts YES

@MartinRandall Good point -- updated the title!

I think, at the end of the day, it's so much easier to learn to sound like convincing internet text rather than learning how to know when you know something and when you don't.

Even if we throw something like RLHF into the mix, "only say something when you think you know what you're talking about" sounds like such a tough objective to specify, starting from the priors learnt from next-token prediction. It's much easier to accidentally specify an objective like "try to sound like a performatively humble human would sound" than "say something that sounds like an epistemically cautious human would say".

Why? Well, for one thing, the misaligned specifications are so much simpler, in terms of something like solomonoff prior, than the aligned specifications. And also, the misaligned specification is THAT MUCH MORE COMMON in internet text.

i would bet no but all my mana is taken up :(

predicts YES

@AlexAmadori No disagreements on any of your points, but to be clear the resolution criteria is empirical. My yes shares are based on the sense that while scaling RLHF will never solve the deeper problem, it seems (plausibly) sufficient to drive easily detectable errors like hallucination to arbitrarily low levels, if you’re willing to throw enough effort at it.

I expect that such a model would still fail in unpredictable ways when out of distribution, but that with enough training you could support a wide enough distribution to meet the 0.1% failure rate. Current methods may be an inefficient way to do this though, so I’m still very uncertain.

on second thought, i hesitate to bet because technically YES is attainable by a model that refuses to answer all but the simplest questions

predicts YES

@AlexAmadori It could get fuzzy, but I would err on the side of excluding such a model. It should be a generally capable and useful model.