🐕 Will Any AI Effectively Achieve Higher Than Human Level at Answering Multiple Choice, Grounded Situations?
9
190Ṁ1560
resolved Jan 8
Resolved
NO

Preface:

Please read the preface for this type of market and other similar third-party validated AI markets here.

Third-Party Validated, Predictive Markets: AI Theme

Market Description

HellaSwag

HellaSWAG is a dataset for studying grounded commonsense inference. It consists of 70k multiple choice questions about grounded situations: each question comes from one of two domains -- activitynet or wikihow -- with four answer choices about what might happen next in the scene. The correct answer is the (real) sentence for the next event; the three incorrect answers are adversarially generated and human verified, so as to fool machines but not humans.

Example HellaSwag Question

A woman is outside with a bucket and a dog. The dog is running around trying to avoid a bath. She

  • a) rinses the bucket off with soap and blow dries the dog's head.

  • b) uses a hose to keep it from getting soapy.

  • c) gets the dog wet, then it runs away again.

  • d) gets into the bath tub with the dog.

Answer: C.

Two Leaderboards:

https://paperswithcode.com/sota/sentence-completion-on-hellaswag

https://leaderboard.allenai.org/hellaswag/submissions/public

  • Human performance is measured at 0.9560

Resolution Criteria

  • We will define Superintelligence for the purposes of this question as, "achieving 99% accuracy on the test in question."

  • Will any entry from the above two links result in a 99% Accuracy Rating? If so, resolves YES, otherwise NO.

20230727 - Changed title, "Superintelligence" to "Higher than Human Level"

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ198
2Ṁ56
3Ṁ8
4Ṁ0
© Manifold Markets, Inc.TermsPrivacy