Will OpenAI's next major LLM (after GPT-4) surpass 74% accuracy on the GPQA benchmark?

Ṁ240Ṁ1.7k

resolved Aug 28

Resolved

YES

ALL

Background: The GPQA (Graduate-Level Google-Proof Q&A Benchmark) is designed to evaluate the capabilities of Large Language Models (LLMs) in answering complex, expert-level multiple-choice questions across disciplines such as biology, physics, and chemistry. This benchmark challenges models with questions that require deep understanding and cannot be solved through simple web searches, reflecting real graduate-level knowledge.

Question: Will the next major release of an OpenAI LLM surpass 74% accuracy on the GPQA benchmark?

Resolution Criteria: For the purpose of this question, the "next major release of an OpenAI LLM" is the next model from OpenAI that satisfies at least one of the following criteria:

It is consistently called "GPT-4.5" or "GPT-5" by OpenAI staff members
It is estimated to have been trained using more than 10^26 FLOP according to a credible source.
It is considered to be the successor to GPT-4 according to more than 70% of my Twitter followers, as revealed by a Twitter poll (if one is taken).

This question will resolve to "YES" if the next major release of an OpenAI LLM released by OpenAI achieves an accuracy rate exceeding 74.0% on the GPQA benchmark using any method, as documented in the first credible public release or publication from OpenAI documenting the model's performance statistics.

More details:

The GPQA consists of 448 expert-crafted questions where domain experts reach 65% accuracy (74% adjusted for clear errors). Highly skilled validators, even with unrestricted web access, only reach 34% accuracy, highlighting the difficulty and sophistication required.
GPT-4 achieved only 39% accuracy in the original study, although Claude 3 Opus was able to achieve 59.5% when using Maj@32 averaged over 10 iterations.

Market context

Technical AI Timelines

OpenAI

LLMs

GPT-5

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ61
2		Ṁ34
3		Ṁ33
4		Ṁ28
5		Ṁ26

People are also trading

Will OpenAI release another open source LLM before end of 2026?

70% chance

In what year will AI achieve a score of 95% or higher on the GPQA benchmark?

1/3/27

How much time will pass between an LLM being released that beats GPT4 and the next OpenAI LLM being released? (+ANSWERS)

Will OpenAI ever top the LMArena leaderboard again before 2030?

Sort by:

I think this should resolve yes given GPT-5's performance

@MatthewBarnett

@MatthewBarnett Resolves YES?

@mods Resolves YES, question authors account is inactive
https://openai.com/index/introducing-gpt-5-for-developers/

This similar post should also resolve YES

https://manifold.markets/MatthewBarnett/will-openais-next-major-llm-after-g?r=SmFzb25i

Does o1 meet your criteria @MatthewBarnett?

This is the next model at time of release, correct?

@JacobPfau Yes, quoting from the criteria,

This question will resolve to "YES" if the next major release of an OpenAI LLM released by OpenAI achieves an accuracy rate exceeding 74.0% on the GPQA benchmark using any method, as documented in the first credible public release or publication from OpenAI documenting the model's performance statistics.

People are also trading

Will OpenAI release another open source LLM before end of 2026?

70% chance

In what year will AI achieve a score of 95% or higher on the GPQA benchmark?

1/3/27

How much time will pass between an LLM being released that beats GPT4 and the next OpenAI LLM being released? (+ANSWERS)

Will OpenAI ever top the LMArena leaderboard again before 2030?

86% chance

🏅 Top traders

People are also trading

People are also trading

Related questions