🐕 Will A.I. Get Significantly Better at Evaluating Scientific Claims by the end of 2024? (As Measured By Leaderboard)
88
1.4kṀ9595
resolved May 6
Resolved
NO

Preface / Inspiration:

  • There are a lot of questions on Manifold about whether or not we'll see sentience, general A.I., and a lot of other nonsense and faith-based questions which rely on the market maker's interpretation and often close at some far distant point in the future when a lot of us will be dead. This is an effort to create meaningful bets on important A.I. questions which are referenced by a third party.

Market Description

SciFact

SciFact is a public leaderboard challenge to attempt to measure AI scientific claims in terms of whether they are supported by evidence in tuples. Inspiration for SciFact From AllenAI:

Due to the rapid growth in the scientific literature, there is a need for automated systems to assist researchers and the public in assessing the veracity of scientific claims.

This challenge employs a public dataset of Claims, Evidence and Decisions which anyone can participate in evaluating. https://leaderboard.allenai.org/scifact/submissions/get-started

Here's an example of a couple Claim vs. Evidence from the :

  • Claim: Prescribed exercise training improves quality of life.

  • Evidence: At 3 months, usual care plus exercise training led to greater improvement in the KCCQ overall summary score (mean, 5.21; 95% confidence interval, 4.42 to 6.00) compared with usual care alone (3.28; 95% confidence interval, 2.48 to 4.09).

  • Decision: SUPPORT

  • Claim: Patients with microcytosis and higher erythrocyte count are more vulnerable to severe malarial anaemia.

  • Evidence: The increased erythrocyte count and microcytosis in children homozygous for alpha(+)-thalassaemia may contribute substantially to their protection against SMA.

  • Decision: REFUTE

Market Resolution Criteria

https://leaderboard.allenai.org/scifact/submissions/public

  • Using my standard metric that I have employed in a few other market places, will any entry surpass the top entry (for Sent+X F1 Score) by the end of the timeperiod by a factor of 1.3?

  • At the time of authoring, the top score is:

MultiVerS

Allen Institute for AI and Un…

06/04/2021 0.6721

Therefore, will any entry on this leaderboard be equal to or greater than 0.8737 by the end of 2024? If so, market resolves YES, otherwise NO.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ874
2Ṁ509
3Ṁ211
4Ṁ195
5Ṁ133
© Manifold Markets, Inc.TermsPrivacy