Before 2027, will OpenAI release a frontier model with a 5:1 or better abstention to hallucination ratio on SimpleQA?
3
1kṀ80
2026
51%
chance

SimpleQA is an obscure trivia benchmark OpenAI uses to evaluate hallucination rate. Resolves yes if a Frontier model released by OpenAI declines to answer SimpleQA questions at least 5 times as often as it incorrectly answers them.

To resolve yes, the model must not have access to the internet or any database of facts. I will be largely looking towards OpenAI's evaluations to resolve this market but will accept 3rd party evals if OpenAI stops using SimpleQA.

OpenAI's latest frontier model GPT-5-thinking has only a 1:8 ratio. However, OpenAI claimed in a recent paper that they have new insights into why LLMs hallucinate and how to prevent them. Additionally, the smaller GPT-5-thinking-mini achieves nearly a 2:1 ratio.

A result that seems obviously rigged (such as a model achieving 100% accuracy or a model that declines to answer every question or something similar) will not resolve yes.

https://arxiv.org/abs/2509.04664

https://cdn.openai.com/gpt-5-system-card.pdf

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy