What will be the best score on the GAIA benchmark before 2025?

7

180Ṁ152

resolved Jan 29

Resolved as

65%

1H

6H

1D

1W

1M

ALL

This question will resolve as the state-of-the-art average score on the GAIA benchmark (on the test set, not validation set) by an AI system, including any post-training enhancements but excluding any human assistance. This will be based on credible publicly available results prior to January 1st 2025. The primary credible source will be the official leaderboard, but other sources, including but not limited to arXiv preprints and papers may also be considered.

Background Information:

See GAIA,
GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). (See our paper for more details.) GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. It is therefore divided in 3 levels, where level 1 should be breakable by very good LLMs, and level 3 indicate a strong jump in model capabilities. Each level is divided into a fully public dev set for validation, and a test set with private answers and metadata.
Best score on March 15th 2024 is GPT-4-turbo based and achieved 32.33%.

Part of the AI Benchmarks series by the AI Safety Student Team at Harvard on evaluations of AI models against technical benchmarks. Full list of questions:

Technical AI Timelines

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ13
2		Ṁ5
3		Ṁ4

People are also trading

ARC-AGI-2 Top Score >=50% in 2025?

+8% 1d30% chance

Will an AI score over 80% on FrontierMath Benchmark in 2025

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

-8% 1d45% chance

In what year will AI achieve a score of 85% or higher on the SimpleBench leaderboard?

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

-9% 1d62% chance

Will any AI model score >80% on Epoch's Frontier Math Benchmark in 2025?

-5% 1d10% chance

Will an AI system beat humans in the GAIA benchmark before the end of 2025?

What will be the best score (5/5 reliability) on ZeroBench by December 31st 2025?

What will be the best score on Cybench by December 31st 2025?

What will be the best AI performance on Humanity's Last Exam by December 31st 2025?

Related questions

ARC-AGI-2 Top Score >=50% in 2025?

Will an AI score over 80% on FrontierMath Benchmark in 2025

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

In what year will AI achieve a score of 85% or higher on the SimpleBench leaderboard?

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

Will any AI model score >80% on Epoch's Frontier Math Benchmark in 2025?

Will an AI system beat humans in the GAIA benchmark before the end of 2025?

What will be the best score (5/5 reliability) on ZeroBench by December 31st 2025?

What will be the best score on Cybench by December 31st 2025?

What will be the best AI performance on Humanity's Last Exam by December 31st 2025?

© Manifold Markets, Inc.•Terms•Privacy