Before 2026, will frontier AI models get much better at expressing calibrated uncertainty in their answers?
8
1kṀ758
2026
65%
chance

Current LLMs are consistently overconfident in their answers and show very poor calibration between stated probability that an answer is correct and actual correctness.

https://openai.com/index/introducing-simpleqa/

Resolves yes if by 2026 frontier models become much better at expressing uncertainty in their answers. I will base the resolution on both benchmarks and my subjective feelings on the matter.

It is not enough for a model to just express uncertainty when asked. It must be proactive about it. For example if I ask the model to return a list of every NBA player older than 30 and the list excludes a bunch of players, it should say something like "I'm not sure I got everyone" before returning its answer

  • Update 2025-02-05 (PST) (AI summary of creator comment): Deep Research models clarified:

    • Inclusion: Deep Research models are considered part of the eligible frontier AI models.

    • Resolution: Their performance in expressing calibrated uncertainty will be evaluated using the same benchmarks outlined in the market.

Get
Ṁ1,000
to start trading!
Sort by:
filled a Ṁ50 YES at 65% order

Which models are we talking about here. Only LLM/multimodals or would the Deep Research models count?

@WilliamGunn Deep Research would count if it met my criteria. Though based on what I've seen I highly doubt it does

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules