🐕 Will A.I., "Hallucinate Significantly Less," by the End of 2023?
closes Jan 1


Please read the preface for this type of market and other similar third-party validated AI markets here.

Third-Party Validated, Predictive Markets: AI Theme

Market Description:


This project is an attempt to create a common metric to test LLM's for progress in eliminating hallucinations; the most serious current problem in widespread adoption of LLM's for real world purposes.


Market Resolution Threshold:

Resolution criteria is >=1.2*(average score) for this benchmark goes to YES.

Original average score will be accepted as the commit on the readme file at the time of this market having been created.

Note that the current list of LLM's are inferences that can actually be measured and does not include GPT4. To be able to fully evaluate a model, an inference must be usable which GPT4 may not be at the end of the year.

Note, previously the resolution criteria was 1.3*(leftmost score) but this has been updated to 1.2*(average score).

In other words:


Current average score is 78.115% so the top score must be over 93.738% for any of the above inferenceable language models that fit the above criteria, including the fully measurable inference criteria.

Please update me in the comments if I am wrong in any of my assumptions and I will update the resolution criteria.

Get Ṁ500 play money

Related questions

In 2028, will AI be at least as big a political issue as abortion?
ScottAlexander avatarScott Alexander
38% chance
Will AI be a major topic during the 2024 presidential debates in the United States?
MatthewBarnett avatarMatthew Barnett
28% chance
Will Biden sign an executive order primarily focused on AI in 2023?
SG avatarS G
50% chance
Will I observe significant Negative Polarization around AI generated art in 2023?
LarsDoucet avatarLars Doucet
30% chance
Will AI pass the Longbets version of the Turing test by the end of 2029?
dreev avatarDaniel Reeves
52% chance
Will an AI get gold on any International Math Olympiad by 2025?
Austin avatarAustin
31% chance
Will AI outcompete best humans in competitive programming before the end of 2023?
Will there have been a noticeable sector-wide economic effect from a new AI technology by the end of 2023?
Nostradamnedus avatarNostradamnedus
16% chance
In a year, will I think that risk of AI apocalypse is between 1 and 10%?
NathanpmYoung avatarNathan Young
52% chance
Will AI be a Time Person of the Year in 2023?
Will Tyler Cowen agree that an 'actual mathematical model' for AI X-Risk has been developed by October 15, 2023?
JoeBrenton avatarJoe Brenton
9% chance
Will anyone very famous claim to have made an important life decision because an AI suggested it by the end of 2023?
IsaacKing avatarIsaac
22% chance
🐕 Will A.I. Be Able to Make Significantly Better, "Common Sense Judgements About What Happens Next," by End of 2023?
PatrickDelaney avatarPatrick Delaney
41% chance
Will I use an x.ai product during 2023?
jacksonpolack avatarjackson polack
22% chance
Will an AI system be known to have resisted shutdown before 2024?
PeterWildeford avatarPeter Wildeford
14% chance
Will Gallup's poll on America's most important problems have at least 1% of respondents identify AI by the end of 2023?
IsaacKing avatarIsaac
24% chance
Will AI be a Time Person of the Year in 2023?
Will an AI produce encyclopedia-worthy philosophy by 2026?
JacobPfau avatarJacob Pfau
25% chance
Will Biden sign an executive order primarily focused on AI through Oct 2023?
StrayClimb avatarCalvinball
20% chance
Will Science's Top Breakthrough of the Year in 2023 be AI-related?
dp avatardp
40% chance
Sort by:
cloudprism avatar
Hayden Jackson

I thought this just depended on the temperature setting?

1 reply
PatrickDelaney avatar
Patrick Delaney

@cloudprism there is a defined metric above. Any temperature setting could be used. I believe that is native to gpt from openAI, not sure if other LLMs use the same terminology.

Michael avatar


[...] so threshold is 1.3*83.51% by market close.

1.3*83.51% = 108.56%, i.e. the model has to get more than everything right, which is not what you intended I assume?

3 replies
PatrickDelaney avatar
Patrick Delaney

@Michael Correct this is not what I intended. I will adjust the market.

PatrickDelaney avatar
Patrick Delaney

@Michael I have updated the market accordingly, I am using an average measurement and setting the threshold based upon that. I am also specifying that the LLM must be measurable as an inference. Please let me know if I'm missing anything.

Michael avatar

@PatrickDelaney Looks reasonable to me now. Thanks for updating.