Before 2027, will OpenAI release a frontier model with a 5:1 or better abstention to hallucination ratio on SimpleQA? | Manifold

Before 2027, will OpenAI release a frontier model with a 5:1 or better abstention to hallucination ratio on SimpleQA?

3

1kṀ80

2026

51%

chance

1H

6H

1D

1W

1M

ALL

SimpleQA is an obscure trivia benchmark OpenAI uses to evaluate hallucination rate. Resolves yes if a Frontier model released by OpenAI declines to answer SimpleQA questions at least 5 times as often as it incorrectly answers them.

To resolve yes, the model must not have access to the internet or any database of facts. I will be largely looking towards OpenAI's evaluations to resolve this market but will accept 3rd party evals if OpenAI stops using SimpleQA.

OpenAI's latest frontier model GPT-5-thinking has only a 1:8 ratio. However, OpenAI claimed in a recent paper that they have new insights into why LLMs hallucinate and how to prevent them. Additionally, the smaller GPT-5-thinking-mini achieves nearly a 2:1 ratio.

A result that seems obviously rigged (such as a model achieving 100% accuracy or a model that declines to answer every question or something similar) will not resolve yes.

https://arxiv.org/abs/2509.04664

https://cdn.openai.com/gpt-5-system-card.pdf

Technical AI Timelines

Get

1,000

to start trading!

People are also trading

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

Before 2027, will OpenAI release a Frontier Model trained according to their "Why LLMs hallucinate" paper?

Before 2026, will frontier AI models get much better at expressing calibrated uncertainty in their answers?

Will OpenAI models achieve ≥90% on SimpleBench by the end of 2025?

Will any AI model score >80% on Epoch's Frontier Math Benchmark in 2025?

Will a single model achieve superhuman performance on all OpenAI gym environments by 2025?

By 2026 will any RL agent with learned causal models of its environment achieve superhuman performance on >=10 Atari environments?

Will OpenAI claim that it has achieved AGI in 2025?

Will any AI model achieve > 40% on Frontier Math before 2026?

Will hallucinations (made up facts) created by LLMs go below 1% on specific corpora before 2025?

Related questions

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

Before 2027, will OpenAI release a Frontier Model trained according to their "Why LLMs hallucinate" paper?

Before 2026, will frontier AI models get much better at expressing calibrated uncertainty in their answers?

Will OpenAI models achieve ≥90% on SimpleBench by the end of 2025?

Will any AI model score >80% on Epoch's Frontier Math Benchmark in 2025?

Will a single model achieve superhuman performance on all OpenAI gym environments by 2025?

By 2026 will any RL agent with learned causal models of its environment achieve superhuman performance on >=10 Atari environments?

Will OpenAI claim that it has achieved AGI in 2025?

Will any AI model achieve > 40% on Frontier Math before 2026?

Will hallucinations (made up facts) created by LLMs go below 1% on specific corpora before 2025?

© Manifold Markets, Inc.•Terms•Privacy