๐Ÿ• Will A.I. Be Able to Meet Just Below Human Performance In Being Able to "Track Changes in State," By the End of 2023?
Basic
27
แน€2715
resolved Jan 8
Resolved
NO

Preface:

Please read the preface for this type of market and other similar third-party validated AI markets here.

Third-Party Validated, Predictive Markets: AI Theme


Attempting to improve upon this market, and make it a bit more interesting.

  • Some have pointed out in the below market that the threshold may be too high because it's above human performance.

  • This is instead an attempt to match human performance within a margin of error. Please read the threshold description at the bottom.

Preface / Inspiration:

  • There are a lot of questions on Manifold about whether or not we'll see sentience, general A.I., and a lot of other nonsense and faith-based questions which rely on the market maker's interpretation and often close at some far distant point in the future when a lot of us will be dead. This is an effort to create meaningful bets on important A.I. questions which are referenced by a third party.

Market Description

ProPara

ProPara aims to promote the research in natural language understanding in the context of procedural text. This requires identifying the actions described in the paragraph and tracking state changes happening to the entities involved.

Example Question

Given this five-sentence procedural paragraph (id 1167 from the training partition):

โ‘  The gravity of the sun pulls its mass inward. โ‘ก There is a lot of pressure on the Sun. โ‘ข The pressure forces atoms of hydrogen to fuse together in nuclear reactions. โ‘ฃ The energy from the reactions gives off different kinds of light. โ‘ค The light travels to the Earth.

Consider the two participant entities:

  • atoms of hydrogen

  • sunlight or light

Predict answers to these four questions:

  1. What are the Inputs?

    • That is, which participants existed before the procedure began, and don't exist after the procedure ended? Or, what participants were consumed?

    • Answer: The inputs are atoms of hydrogen.

  2. What are the Outputs?

    • That is, which participants existed after the procedure ended, but didn't exist before the procedure began? Or, what participants were produced?

    • Answer: The outputs are light (or sunlight).

  3. What are the Conversions?

    • That is, which participants were converted to which other participants?

    • Answer: The participant atoms of hydrogen is converted into light (or sunlight) in sentence 3.

  4. What are the Moves?

    • That is, which participants moved from one location to another?

    • Answer: The participant light (or sunlight) moves from sun to earth in sentence 5.

Market Resolution Criteria

https://leaderboard.allenai.org/propara/submissions/public

  • Top score on F1 is 0.731.

  • Human Performance on F1 is shown as 0.839

  • A margin of error could be accepted as 2% for the purposes of this market, to make it a bit more interesting.

  • (0.839*2%) = 1.678%

  • 0.839 - 0.017 = 0.822

Target to Beat = 0.822

Based upon a +/- 2% margin of error, we would need to see any public submission reach 0.822 or greater by the end of the year for this to resolve YES, otherwise NO.

Get
แน€1,000
and
S3.00
Sort by:

Searched for ProPara, could not find any evidence of other leaderboards on this metric, so closing out at NO for this year. Thanks for betting!

I have no special knowledge, just taking the opposite bet of the current position in the market for the sake of discussion.

ยฉ Manifold Markets, Inc.โ€ขTerms + Mana-only Termsโ€ขPrivacyโ€ขRules