https://fortune.com/2023/08/01/can-ai-chatgpt-hallucinations-be-fixed-experts-doubt-altman-openai/
“This isn’t fixable,” said Emily Bender, a linguistics professor and director of the University of Washington’s Computational Linguistics Laboratory. “It’s inherent in the mismatch between the technology and the proposed use cases.”
How true will this end up being? At the end of 2028 I will evalaute whether the hallucination problem for LLMs has been fixed or still exists. If hallucinations have been solved this market resolves YES. If the outstanding hallucination problem still exists, this market will resolve NO.
This is a followup market to this related market:
@MichaelM No, not really. It's just an artifact of a training regime. It's like saying the appendix is a part of human anatomy. Or inverted retina nerves... Something that's maybe annoying, but not fatal.
@firstuserhere It's not the only or even main thing, but I wouldn't say it's unrelated.
@firstuserhere True. And even hallucinations about verifiable facts can be useful (these are called "educated guesses"). But I suppose the question here is not really "will LLMs hallucinate" (obviously, they will) but rather "will hallucinations be a problem" (i.e. are almost all guesses and fictions clearly indicated as such rather than being presented as known truths). In theory, this is solvable, but I'm not optimistic. On the other hand, there's a silver lining because it could teach people more to "trust, but verify", which is a useful life skill.
So to clarify, if by 2028, language models rarely hallucinate in major ways, this resolves YES? Here are a few more fine-grained questions:
What if very few everyday users can elicit hallucinations, but adversarial prompts created by experts still can?
What if models become good at answering in a way that includes their uncertainty? For example, models still sometimes say false things, but are great at flagging that these are points of uncertainty?
In a search context, what if the answers of language models essentially subsume traditional search engines? Search engines sometimes misinterpret or mislead users, but for the most part users can eventually glean reliable information. If models are roughly as reliable as this, meaning they still sometimes make mistakes, but rarely enough that they largely replace search, does this resolve YES?
What if very few everyday users can elicit hallucinations, but adversarial prompts created by experts still can?
Think you mean "everyday users cannot elicit ... "
@firstuserhere I’m of the opinion that full robustness (even non-adversarial robustness) is way, way harder than mere consistency. I meant that question to invoke a scenario where non-adversarial prompts cause hallucinations rarely but not never, while adversarial prompts can cause them more often.
@AdamK If the model is capable of accurately flagging uncertainty, solving hallucinations would be as easy as adding a filter that replaces uncertain statements with "I'm not sure", and I'm quite confident that something like that would be implemented, at least as a toggleable option.
@AdamK Sure thing, sorry for the delayed reply I was finishing up a game jam.
Rough initial thoughts, but welcome for discussion.
- What if very few everyday users can elicit hallucinations, but adversarial prompts created by experts still can?
Depends on the prompts. I think that if I personally can copy a prompt that follows the definitions already established and elicit a hallucination then the market resolves NO.
- What if models become good at answering in a way that includes their uncertainty? For example, models still sometimes say false things, but are great at flagging that these are points of uncertainty?
Probabilistic language while maintaining usefulness seems like a viable solution path. The flagging would need to actually be valid of course. Furthermore, just always hedging every statement doesn't really count.
- In a search context, what if the answers of language models essentially subsume traditional search engines? Search engines sometimes misinterpret or mislead users, but for the most part users can eventually glean reliable information. If models are roughly as reliable as this, meaning they still sometimes make mistakes, but rarely enough that they largely replace search, does this resolve YES?
No, I think replacing search won't be a relevant metric. We are looking at the technical hallucination problem. It's possible that future GPTs are still incredibly economically useful despite not making any real progress on hallucinations, and that would not be sufficient.
@SneakySly Some thoughts on these criteria:
- "Depends on the prompts. I think that if I personally can copy a prompt that follows the definitions already established and elicit a hallucination then the market resolves NO."
This seems like a very high bar, especially because hallucination and jailbreaking are considered distinct. The latter is adversarial, meaning we are eliciting failure after placing optimization pressure on the prompt. Hallucination is just "a confident response by an AI that does not seem to be justified by its training data," and generally refers to failures that occur from non-adversarial prompts. As I mentioned before, I expect hallucinations to be fixed long before we get adversarial robustness. If resolution is based on identifying any prompt floating around on the internet that causes hallucination, this will tend to remain possible long after models functionally never fail on non-adversarial prompts.
Another thing the market should specify is roughly how reliably hallucination is solved. In the domain of ML robustness, each nine of reliability is a technical leap (like going from 90% to 99% reliability). Preventing 100% of hallucination is probably an unreasonable standard for this market and is somewhat hard to define anyways. One standard I suggested above is that the market can resolve YES if LMs are roughly as reliable as search engines (I think you misunderstood this as referring to the economic value of LMs). Another potential way to operationalize this is in relation to modern-day models. For example, if OpenAI says that their latest model hallucinates 99% less often than GPT-4 on internal benchmarks, or using good public benchmarks (I don't know of any good ones yet). In any case, I think it would be good to specify something to this effect.
@AdamK These are good points.
In particular this has moved me:
"For example, if OpenAI says that their latest model hallucinates 99% less often than GPT-4 on internal benchmarks, or using good public benchmarks (I don't know of any good ones yet). In any case, I think it would be good to specify something to this effect."
True, if OpenAI claimed that GPT-X hallucinated 99% less often that should resolve this market as YES. For the spirit of this question that is essentially solving the problem. How do we like operationalizing this as if someone posts a reputable article indicating a 95%+ level of hallucination reduction we can say that hallucinations were not an intractable hurdle like these experts claimed and the market can resolve YES. (Benchmarks would work as well if they get created)
As always, feedback welcome.
Added $2000M subsidy
"hallucinations", to me, seems like a vague term for a fundamental fact of intelligence that, in language models, will only get pushed further and further from human view, either due to language models refusing to answer any novel use-cases in domains where it lacks knowledge/permission, or the extent of the knowledge encapsulated in language models exceeding that of most humans, or by responding in platitudes so vague that they are unfalsifiable and couldn't be categorized as hallucinations.
intelligence requires reduction of the complexity of the observed world into more compact regular rules to be computationally tractable (aka a world model). humans do this, and this heuristic reduction of observed reality into a world model is necessarily lossy, and generates beliefs and actions that do not always cohere with reality. i believe this is something inherent to intelligent systems in general, and something that can not fundamentally be "fixed".
however, it seems entirely possible that vagueness, bigger models, guard rails, etc. could convince you that "hallucinations" have been "fixed" in future models, while errors between the AI's world model and the universe will necessarily always exist.
so i'm not really sure what this question is asking.
"will I be able to get a language model to tell me something untrue in 2029?" sure, probably yeah
"will a language model have a better general understanding of reality than humans in 2029?" could be
"will AI systems be 100% perfect in their recollection of all human knowledge in 2029" surely not
"will AI systems have so many guardrails that they only speak in liability-minimizing corporately-approved platitudes in 2029?" seems likely
@brubsby I honestly imagine that this will be pretty clear cut if progress was made in this domain or not.
I found this definition and it's fairly sensible:
"Hallucinations, in this context, signify instances when the AI model “imagines” or “fabricates” information that does not directly correspond to the provided input."
It will have nothing to do with intentionally trying to get the model to tell you something untrue. Think of the LLM making up untrue facts to a factual prompt.
I honestly imagine that this will be pretty clear cut if progress was made in this domain or not.
Yes it will be clear if progress has been made (and it will be), but that's not what you're asking. You're asking whether it will be "fixed". That requires a hard black/white line drawn between what is and is not considered to be a hallucination.
I promise you that there are many researchers and thought leaders who will insist that LLM hallucinations are a pervasive and catastrophic issue with their dying breath, even if the population at large considers them to work just fine.
So no, you really do need to operationalize this market better.
@jonsimon In this case I will be going off of my own judgement unless someone proposes a clear metric I could use.
@SneakySly What does your judgement say currently about the severity of LLM hallucinations? Do you consider it to be a huge/big/medium/small/inconsequential problem for say, GPT-4?
@SneakySly Can you give an example of one of the worse ones you've experienced recently?
@SneakySly does this market resolve early if you think they've been fixed at some point before market close?
@brubsby Yes, for example if GPT-6 or whatever comes out and we all find out hallucinations are a non issue now this could resolve YES.
- I would err on going slow in this case and would be posting my thoughts for discussion. I would not suddenly surprise close this market out of nowhere.
- I am not betting in this market.
@jonsimon Sometimes with obscure API's it will generate up stuff that doesn't exist. Pretty rare though in my own personal experience TBH.
@SneakySly In that case I'm going to flip my answer to Yes since I think there will enough progress by that point to satisfy you. Many people are hard at work on the problem.