Will LLM hallucinations be a fixed problem by the end of 2028?
228
3.9kṀ77k
2029
55%
chance

https://fortune.com/2023/08/01/can-ai-chatgpt-hallucinations-be-fixed-experts-doubt-altman-openai/

“This isn’t fixable,” said Emily Bender, a linguistics professor and director of the University of Washington’s Computational Linguistics Laboratory. “It’s inherent in the mismatch between the technology and the proposed use cases.”

How true will this end up being? At the end of 2028 I will evalaute whether the hallucination problem for LLMs has been fixed or still exists. If hallucinations have been solved this market resolves YES. If the outstanding hallucination problem still exists, this market will resolve NO.

This is a followup market to this related market:



Get
Ṁ1,000
to start trading!
Sort by:

@SneakySly Earlier, you clarified that you'd resolve YES if a language model hallucinates in regular (non-adversarial) use around 1% as often as the original GPT-4 (presumably on a broad and representative range of queries). I think it'd also make sense to establish a threshold using SimpleQA now that there is finally a decent hallucination benchmark; it's actually very good according to the 1% GPT-4 standard since the dataset is adversarially collected against GPT-4. I'd argue <~3% incorrect subject to >50% attempted is a reasonable operationalization.


If model providers roll out features that can be set to use information conservatively, or with sources (a la citations), is such a feature eligible to cause a YES resolution if it is sufficiently reliable? What if it maintains this reliability over Internet-scale corpora, such as an improved version of Perplexity?

I do think sufficiently reliable use of citations over Internet-scale corpora should count toward a YES resolution. Training models to look for and cite sources sounds like a kosher way to fix the "hallucination problem" of model unreliability in factual domains, and this is how they will be used in practice. Resolving based on hallucination rates of vanilla LMs with no access to outside information is not appropriate if this becomes the dominant regime.

bought Ṁ5 NO

Hallucinations are part of LLMs. If the question was more broad to ask if there will be a reasoning model that doesn’t hallucinate by 2028 I would buy YES.

@MichaelM No, not really. It's just an artifact of a training regime. It's like saying the appendix is a part of human anatomy. Or inverted retina nerves... Something that's maybe annoying, but not fatal.

opened a Ṁ500 YES at 43% order

New limit orders up

predictedYES

Relevant - because if they did, then it's more or less solved.

predictedYES

@firstuserhere RAG was actually why I bought YES on this market...

@Mira Yeah, good thinking. What do you think about rumors on the 98% that spooked Ilya?

predictedYES

@firstuserhere It's not the only or even main thing, but I wouldn't say it's unrelated.

predictedYES

@Mira Got it :)))

predictedYES

All novels are hallucinations.

predictedNO

@firstuserhere True. And even hallucinations about verifiable facts can be useful (these are called "educated guesses"). But I suppose the question here is not really "will LLMs hallucinate" (obviously, they will) but rather "will hallucinations be a problem" (i.e. are almost all guesses and fictions clearly indicated as such rather than being presented as known truths). In theory, this is solvable, but I'm not optimistic. On the other hand, there's a silver lining because it could teach people more to "trust, but verify", which is a useful life skill.

Another massive technical challenge market for comparison

predictedYES

So to clarify, if by 2028, language models rarely hallucinate in major ways, this resolves YES? Here are a few more fine-grained questions:

  • What if very few everyday users can elicit hallucinations, but adversarial prompts created by experts still can?

  • What if models become good at answering in a way that includes their uncertainty? For example, models still sometimes say false things, but are great at flagging that these are points of uncertainty?

  • In a search context, what if the answers of language models essentially subsume traditional search engines? Search engines sometimes misinterpret or mislead users, but for the most part users can eventually glean reliable information. If models are roughly as reliable as this, meaning they still sometimes make mistakes, but rarely enough that they largely replace search, does this resolve YES?

predictedYES

@AdamK

What if very few everyday users can elicit hallucinations, but adversarial prompts created by experts still can?

Think you mean "everyday users cannot elicit ... "

predictedYES

@firstuserhere I’m of the opinion that full robustness (even non-adversarial robustness) is way, way harder than mere consistency. I meant that question to invoke a scenario where non-adversarial prompts cause hallucinations rarely but not never, while adversarial prompts can cause them more often.

predictedYES

@AdamK If the model is capable of accurately flagging uncertainty, solving hallucinations would be as easy as adding a filter that replaces uncertain statements with "I'm not sure", and I'm quite confident that something like that would be implemented, at least as a toggleable option.

predictedYES

@SneakySly Can you specify how you would resolve in these cases?

@AdamK Sure thing, sorry for the delayed reply I was finishing up a game jam.

Rough initial thoughts, but welcome for discussion.

- What if very few everyday users can elicit hallucinations, but adversarial prompts created by experts still can?
Depends on the prompts. I think that if I personally can copy a prompt that follows the definitions already established and elicit a hallucination then the market resolves NO.

- What if models become good at answering in a way that includes their uncertainty? For example, models still sometimes say false things, but are great at flagging that these are points of uncertainty?
Probabilistic language while maintaining usefulness seems like a viable solution path. The flagging would need to actually be valid of course. Furthermore, just always hedging every statement doesn't really count.

- In a search context, what if the answers of language models essentially subsume traditional search engines? Search engines sometimes misinterpret or mislead users, but for the most part users can eventually glean reliable information. If models are roughly as reliable as this, meaning they still sometimes make mistakes, but rarely enough that they largely replace search, does this resolve YES?
No, I think replacing search won't be a relevant metric. We are looking at the technical hallucination problem. It's possible that future GPTs are still incredibly economically useful despite not making any real progress on hallucinations, and that would not be sufficient.

predictedYES

@SneakySly Some thoughts on these criteria:
- "Depends on the prompts. I think that if I personally can copy a prompt that follows the definitions already established and elicit a hallucination then the market resolves NO."

This seems like a very high bar, especially because hallucination and jailbreaking are considered distinct. The latter is adversarial, meaning we are eliciting failure after placing optimization pressure on the prompt. Hallucination is just "a confident response by an AI that does not seem to be justified by its training data," and generally refers to failures that occur from non-adversarial prompts. As I mentioned before, I expect hallucinations to be fixed long before we get adversarial robustness. If resolution is based on identifying any prompt floating around on the internet that causes hallucination, this will tend to remain possible long after models functionally never fail on non-adversarial prompts.

Another thing the market should specify is roughly how reliably hallucination is solved. In the domain of ML robustness, each nine of reliability is a technical leap (like going from 90% to 99% reliability). Preventing 100% of hallucination is probably an unreasonable standard for this market and is somewhat hard to define anyways. One standard I suggested above is that the market can resolve YES if LMs are roughly as reliable as search engines (I think you misunderstood this as referring to the economic value of LMs). Another potential way to operationalize this is in relation to modern-day models. For example, if OpenAI says that their latest model hallucinates 99% less often than GPT-4 on internal benchmarks, or using good public benchmarks (I don't know of any good ones yet). In any case, I think it would be good to specify something to this effect.

@AdamK These are good points.

In particular this has moved me:
"For example, if OpenAI says that their latest model hallucinates 99% less often than GPT-4 on internal benchmarks, or using good public benchmarks (I don't know of any good ones yet). In any case, I think it would be good to specify something to this effect."

True, if OpenAI claimed that GPT-X hallucinated 99% less often that should resolve this market as YES. For the spirit of this question that is essentially solving the problem. How do we like operationalizing this as if someone posts a reputable article indicating a 95%+ level of hallucination reduction we can say that hallucinations were not an intractable hurdle like these experts claimed and the market can resolve YES. (Benchmarks would work as well if they get created)

As always, feedback welcome.

Added $2000M subsidy

@firstuserhere Much appreciated!

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules