Will a LLM beat human experts on GPQA by Jan 1, 2025?

1kṀ42k

resolved Dec 20

Resolved

YES

ALL

GQPA dataset here: https://arxiv.org/abs/2311.12022

"Human expert" means 74%.

Currently, GPT-4 gets 39%.

The LLM is allowed to use external tools (e.g. Google, Wolfram Alpha).

Technical AI Timelines

LLMs

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ1,198
2		Ṁ880
3		Ṁ731
4		Ṁ621
5		Ṁ605

Comments

55 Holders

201 Trades

What is this?

What is Manifold?

Manifold is the world's largest social prediction market.

Get accurate real-time odds on politics, tech, sports, and more.

Or create your own play-money betting market on any question you care about.

Are our predictions accurate?

Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.

In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like betting still use Manifold to get reliable news.

Why use play money?

Mana (Ṁ) is the play-money currency used to bet on Manifold. It cannot be converted to cash. All users start with Ṁ1,000 for free.

Play money means it's much easier for anyone anywhere in the world to get started and try out forecasting without any risk. It also means there's more freedom to create and bet on any type of question.

People are also trading

Will an LLM beat a Super GM Bot on chess.com by 2028?

-5% 1d51% chance

Will there be an LLM which scores above what a human can do in 2 hours on METR's eval suite before 2026?

67% chance

LLM Hallucination: Will an LLM score >90% on SimpleQA before 2026?

60% chance

What organization will top the LLM leaderboards on LMArena at end of 2025? 🤖📊

Will an LLM (a GPT-like text AI) defeat the World Champion at Chess before 2035?

72% chance

Will the best public LLM at the end of 2025 solve more than 5 of the first 10 Project Euler problems published in 2026?

75% chance

Will there be an LLM (as good as GPT-4) that was trained with 1/100th the energy consumed to train GPT-4, by 2026?

83% chance

Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?

50% chance

Will the most interesting AI in 2027 be a LLM?

70% chance

Will there be any simple text-based task that most humans can solve, but top LLMs can't? By the end of 2026

64% chance

What is this?

What is Manifold?

Manifold is the world's largest social prediction market.

Get accurate real-time odds on politics, tech, sports, and more.

Or create your own play-money betting market on any question you care about.

Are our predictions accurate?

Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.

Why use play money?

Mana (Ṁ) is the play-money currency used to bet on Manifold. It cannot be converted to cash. All users start with Ṁ1,000 for free.

🏅 Top traders

What is this?

People are also trading

What is this?

Related questions