🐕 Will A.I., "Hallucinate Significantly Less," by the End of 2023?
18
closes Jan 1
25%
chance

Preface:

Please read the preface for this type of market and other similar third-party validated AI markets here.

Third-Party Validated, Predictive Markets: AI Theme

Market Description:

HALTT4LLM

This project is an attempt to create a common metric to test LLM's for progress in eliminating hallucinations; the most serious current problem in widespread adoption of LLM's for real world purposes.

https://github.com/manyoso/haltt4llm

Market Resolution Threshold:

Resolution criteria is >=1.2*(average score) for this benchmark goes to YES.

Original average score will be accepted as the commit on the readme file at the time of this market having been created.

Note that the current list of LLM's are inferences that can actually be measured and does not include GPT4. To be able to fully evaluate a model, an inference must be usable which GPT4 may not be at the end of the year.

Note, previously the resolution criteria was 1.3*(leftmost score) but this has been updated to 1.2*(average score).

In other words:

C=1.2

Current average score is 78.115% so the top score must be over 93.738% for any of the above inferenceable language models that fit the above criteria, including the fully measurable inference criteria.

Please update me in the comments if I am wrong in any of my assumptions and I will update the resolution criteria.

Get Ṁ500 play money

Related questions

In 2028, will AI be at least as big a political issue as abortion?
ScottAlexander avatarScott Alexander
29% chance
Before 2028, will any prediction market come up with a robust way to run a market on AI extinction risk?
IsaacKing avatarIsaac
33% chance
Will AI be a major topic during the 2024 presidential debates in the United States?
MatthewBarnett avatarMatthew Barnett
29% chance
Will AI pass the Longbets version of the Turing test by the end of 2029?
dreev avatarDaniel Reeves
53% chance
Will Biden sign an executive order primarily focused on AI in 2023?
SG avatarS G
55% chance
Will an AI get gold on any International Math Olympiad by 2025?
Austin avatarAustin
30% chance
Will Bostrom's "Superintelligence" exceed its current popularity peak before 2028?
Metastable avatarMetastable
21% chance
Will Tyler Cowen agree that an 'actual mathematical model' for AI X-Risk has been developed by October 15, 2023?
JoeBrenton avatarJoe Brenton
9% chance
Will anyone very famous claim to have made an important life decision because an AI suggested it by the end of 2023?
IsaacKing avatarIsaac
22% chance
Will there have been a noticeable sector-wide economic effect from a new AI technology by the end of 2023?
Nostradamnedus avatarNostradamnedus
13% chance
Will AI outcompete best humans in competitive programming before the end of 2023?
In a year, will I think that risk of AI apocalypse is between 1 and 10%?
NathanpmYoung avatarNathan Young
52% chance
🐕 Will A.I. Be Able to Make Significantly Better, "Common Sense Judgements About What Happens Next," by End of 2023?
PatrickDelaney avatarPatrick Delaney
41% chance
By end of 2028, will AI be considered a bigger x risk than climate change by the general US population?
NathanNguyen avatarNathan Nguyen
50% chance
Will I observe significant Negative Polarization around AI generated art in 2023?
LarsDoucet avatarLars Doucet
27% chance
Will the left/right culture war come for AI before the end of 2023?
LarsDoucet avatarLars Doucet
5% chance
Will AI be a Time Person of the Year in 2023?
Will AI be a Time Person of the Year in 2023?
Will Gallup's poll on America's most important problems have at least 1% of respondents identify AI by the end of 2023?
IsaacKing avatarIsaac
24% chance
Will Biden sign an executive order primarily focused on AI through Nov 2023?
StrayClimb avatarCalvinball
30% chance
Sort by:
cloudprism avatar
Hayden Jackson

I thought this just depended on the temperature setting?

1 reply
PatrickDelaney avatar
Patrick Delaney

@cloudprism there is a defined metric above. Any temperature setting could be used. I believe that is native to gpt from openAI, not sure if other LLMs use the same terminology.

Michael avatar
Michael

@PatrickDelaney

[...] so threshold is 1.3*83.51% by market close.

1.3*83.51% = 108.56%, i.e. the model has to get more than everything right, which is not what you intended I assume?

3 replies
PatrickDelaney avatar
Patrick Delaney

@Michael Correct this is not what I intended. I will adjust the market.

PatrickDelaney avatar
Patrick Delaney

@Michael I have updated the market accordingly, I am using an average measurement and setting the threshold based upon that. I am also specifying that the LLM must be measurable as an inference. Please let me know if I'm missing anything.

Michael avatar
Michael

@PatrickDelaney Looks reasonable to me now. Thanks for updating.