Will a LLM beat human experts on GPQA by Jan 1, 2025?
Will a LLM beat human experts on GPQA by Jan 1, 2025?
57
1kṀ42kresolved Dec 20
Resolved
YES1H
6H
1D
1W
1M
ALL
GQPA dataset here: https://arxiv.org/abs/2311.12022
"Human expert" means 74%.
Currently, GPT-4 gets 39%.
The LLM is allowed to use external tools (e.g. Google, Wolfram Alpha).
This question is managed and resolved by Manifold.
Get
1,000 to start trading!
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ1,198 | |
2 | Ṁ880 | |
3 | Ṁ731 | |
4 | Ṁ621 | |
5 | Ṁ605 |
What is this?
What is Manifold?
Manifold is the world's largest social prediction market.
Get accurate real-time odds on politics, tech, sports, and more.
Or create your own play-money betting market on any question you care about.
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like betting still use Manifold to get reliable news.
Why use play money?
Mana (Ṁ) is the play-money currency used to bet on Manifold. It cannot be converted to cash. All users start with Ṁ1,000 for free.
Play money means it's much easier for anyone anywhere in the world to get started and try out forecasting without any risk. It also means there's more freedom to create and bet on any type of question.
People are also trading
What is this?
What is Manifold?
Manifold is the world's largest social prediction market.
Get accurate real-time odds on politics, tech, sports, and more.
Or create your own play-money betting market on any question you care about.
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like betting still use Manifold to get reliable news.
Why use play money?
Mana (Ṁ) is the play-money currency used to bet on Manifold. It cannot be converted to cash. All users start with Ṁ1,000 for free.
Play money means it's much easier for anyone anywhere in the world to get started and try out forecasting without any risk. It also means there's more freedom to create and bet on any type of question.
Related questions
Will an LLM beat a Super GM Bot on chess.com by 2028?
51% chance
Will there be an LLM which scores above what a human can do in 2 hours on METR's eval suite before 2026?
67% chance
LLM Hallucination: Will an LLM score >90% on SimpleQA before 2026?
60% chance
What organization will top the LLM leaderboards on LMArena at end of 2025? 🤖📊
Will an LLM (a GPT-like text AI) defeat the World Champion at Chess before 2035?
72% chance
Will the best public LLM at the end of 2025 solve more than 5 of the first 10 Project Euler problems published in 2026?
75% chance
Will there be an LLM (as good as GPT-4) that was trained with 1/100th the energy consumed to train GPT-4, by 2026?
83% chance
Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?
50% chance
Will the most interesting AI in 2027 be a LLM?
70% chance
Will there be any simple text-based task that most humans can solve, but top LLMs can't? By the end of 2026
64% chance