LLM Hallucination: Will an LLM score >90% on SimpleQA before 2026?
➕
Plus
21
Ṁ1595
2026
55%
chance

Using the correct given attempted metric in https://cdn.openai.com/papers/simpleqa.pdf Attempt rate must be at least 30%. No search/retrieval allowed.

"An open problem in artificial intelligence is how to train models that produce responses that are factually correct. Current language models sometimes produce false outputs or answers unsubstantiated by evidence, a problem known as “hallucinations”. Language models that generate more accurate responses with fewer hallucinations are more trustworthy and can be used in a broader range of applications. To measure the factuality of language models, we are open-sourcing⁠(opens in a new window) a new benchmark called SimpleQA."

Get
Ṁ1,000
and
S3.00
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules