By when will AIs perform at least as well as humans on GAIA?
22
1.9kṀ6220
2036
70%
Before 2026-01-01
87%
Before 2027-01-01
92%
Before 2028-01-01
95%
Before 2030-01-01
97%
Before 2035-01-01
Resolved
NO
Before 2024-06-01
Resolved
NO
Before 2025-01-01

The GAIA benchmark (https://arxiv.org/abs/2311.12983) aims to test for the next level of capability for AI agents.

Quoting from the paper: "GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins."

This market will resolve based on when an AI system performs as well or better than humans on all 3 of the different levels of the benchmark. I'll use the numbers from Table 4 in paper: 93.9% on level 1, 91.8% on level 2, and 87.3% on level 3.

(I'm using the conjunction of all 3 levels rather than the average to be somewhat conservative about this level being achieved.)

If a given submission was likely trained on the test set (based on my judgement), I won't consider this valid.

This market resolves based on the date of publication/submission of a credible document or leaderboard entry which indicates that the corresponding performance on GAIA was reached. (Not the date at which the system was originally created.)

Each date will resolve YES if this publication/submission takes place before that date (UTC). Otherwise NO.

(I may add additional options later to add additional resolution.)

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy