Will AIs beat human experts in question-answering on the GPQA benchmark before January 1st, 2027? | Manifold

Will AIs beat human experts in question-answering on the GPQA benchmark before January 1st, 2027?

30

1kṀ4336

2027

95%

chance

1H

6H

1D

1W

1M

ALL

From the abstract,

We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. We ensure that the questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy (74% when discounting clear mistakes the experts identified in retrospect), while highly skilled non-expert validators only reach 34% accuracy, despite spending on average over 30 minutes with unrestricted access to the web (i.e., the questions are "Google-proof"). The questions are also difficult for state-of-the-art AI systems, with our strongest GPT-4 based baseline achieving 39% accuracy.

This question resolves to YES if a credible paper, blog post, or document of any kind indicates that at least some AI obtained a score of greater than 74.0% on the GPQA dataset before January 1st 2027, and NO otherwise. The result must be credible, and I will exclude results that appear to be the result of cheating: for example, results obtained by training on the test set.

Technical AI Timelines

Get

1,000

to start trading!

Sort by:

People are also trading

Will an AI score 1st place on International Math Olympiad (IMO) 2025?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

-8% 1d45% chance

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

Will an AI system beat humans in the GAIA benchmark before the end of 2025?

Will AI top level capabilities generally be judged by question and answer benchmarks in 2029?

In what year will AI achieve a score of 95% or higher on the GPQA benchmark?

What will be the best AI performance on Humanity's Last Exam by December 31st 2025?

Will an AI model achieve superhuman ELO on Codeforces by the 31 December 2025?

Will AI pass the Winograd schema challenge by the end of 2025?

Will Quora questions be auto-answered by a more sophisticated bot (at level of GPT3.5 or higher) by EOY 2025?

Related questions

Will an AI score 1st place on International Math Olympiad (IMO) 2025?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

Will an AI system beat humans in the GAIA benchmark before the end of 2025?

Will AI top level capabilities generally be judged by question and answer benchmarks in 2029?

In what year will AI achieve a score of 95% or higher on the GPQA benchmark?

What will be the best AI performance on Humanity's Last Exam by December 31st 2025?

Will an AI model achieve superhuman ELO on Codeforces by the 31 December 2025?

Will AI pass the Winograd schema challenge by the end of 2025?

Will Quora questions be auto-answered by a more sophisticated bot (at level of GPT3.5 or higher) by EOY 2025?

© Manifold Markets, Inc.•Terms•Privacy