Preface:
Please read the preface for this type of market and other similar third-party validated AI markets here.
Third-Party Validated, Predictive Markets: AI Theme
Market Description
ARC
The AI2 Reasoning Challenge (ARC) aims to promote research in advanced question-answering, in particular questions that require reasoning, use of commonsense knowledge, and other methods for deeper text comprehension. In particular, the ARC Challenge questions are those that are hard to answer with simple baselines.
Example ARC Question
Which property of a mineral can be determined just by looking at it?
(A) luster
(B) mass
(C) weight
(D) hardness
https://leaderboard.allenai.org/arc/submissions/public
https://paperswithcode.com/sota/common-sense-reasoning-on-arc-challenge
Market Resolution
As of the time of market creation in July 2023, the top submission is GPT-4 with 96.3:
Resolution Criteria
We will define Superintelligence for the purposes of this question as, "achieving 99% accuracy on the test in question."
Will any entry from the above two links result in a 99% Accuracy Rating? If so, resolves YES, otherwise NO.
20230727 - Changed title, "Superintelligence" to "Higher than Human Level"
@EliezerYudkowsky Can you suggest a better term? Happy to change it. I'm trying to express the idea that in this particular domain or set of tests, the measurement of human performance has already been surpassed by AI.
@PatrickDelaney I'd write the title as follows:
"Will any AI get a significantly better score than humans on common sense reasoning questions by the end of 2023?"