Will OpenAI's next major LLM (after GPT-4) surpass 70% accuracy on the GPQA benchmark?

220Ṁ1367

resolved Aug 28

Resolved

YES

ALL

Background: The GPQA (Graduate-Level Google-Proof Q&A Benchmark) is designed to evaluate the capabilities of Large Language Models (LLMs) in answering complex, expert-level multiple-choice questions across disciplines such as biology, physics, and chemistry. This benchmark challenges models with questions that require deep understanding and cannot be solved through simple web searches, reflecting real graduate-level knowledge.

Question: Will the next major release of an OpenAI LLM surpass 70% accuracy on the GPQA benchmark?

Resolution Criteria: For the purpose of this question, the "next major release of an OpenAI LLM" is the next model from OpenAI that satisfies at least one of the following criteria:

It is consistently called "GPT-4.5" or "GPT-5" by OpenAI staff members
It is estimated to have been trained using more than 10^26 FLOP according to a credible source.
It is considered to be the successor to GPT-4 according to more than 70% of my Twitter followers, as revealed by a Twitter poll (if one is taken).

This question will resolve to "YES" if the next major release of an OpenAI LLM released by OpenAI achieves an accuracy rate exceeding 70% on the GPQA benchmark using any method, as documented in the first credible public release or publication from OpenAI documenting the model's performance statistics.

More details:

The GPQA consists of 448 expert-crafted questions where domain experts reach 65% accuracy (74% adjusted for clear errors). Highly skilled validators, even with unrestricted web access, only reach 34% accuracy, highlighting the difficulty and sophistication required.
GPT-4 achieved only 39% accuracy in the original study, although Claude 3 Opus was able to achieve 59.5% when using Maj@32 averaged over 10 iterations.

Technical AI Timelines

OpenAI

LLMs

GPT-5

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ72
2		Ṁ56
3		Ṁ30
4		Ṁ28
5		Ṁ16

People are also trading

Will OpenAI's next major LLM (after GPT-4) solve more than 2 of the first 5 new Project Euler problems?

63% chance

Will OpenAI's next major LLM (after GPT-4) achieve over 50% resolution rate on the SWE-bench benchmark?

99% chance

Will xAI develop a more capable LLM than GPT-5 before 2026

12% chance

Will OpenAI's next major LLM (after GPT-4) feature natural and convenient speech-to-speech capabilities?

81% chance

In what year will AI achieve a score of 95% or higher on the GPQA benchmark?

5/25/27

How much time will pass between an LLM being released that beats GPT4 and the next OpenAI LLM being released? (+ANSWERS)

Will OpenAI release another open source LLM before end of 2026?

Sort by:

This should resolve YES from GPT-5

@mods please may you resolve YES, question authors account is inactive
https://openai.com/index/introducing-gpt-5-for-developers/

(See also https://manifold.markets/MatthewBarnett/will-openais-next-major-llm-after-g-a5fa8b913137)

o1 gets 78.3 on the GPQA. this is resolved

@PhillipBallardsoftclone It does not count as a major release by the standard defined here. I think Gpt-4o might if there was a twitter pole just asking if it's the successor to gpt-4 though I'm not certain and don't think it meets the spirit of the criteria either tbh.

People are also trading

Will OpenAI's next major LLM (after GPT-4) solve more than 2 of the first 5 new Project Euler problems?

63% chance

Will OpenAI's next major LLM (after GPT-4) achieve over 50% resolution rate on the SWE-bench benchmark?

99% chance

Will xAI develop a more capable LLM than GPT-5 before 2026

12% chance

Will OpenAI's next major LLM (after GPT-4) feature natural and convenient speech-to-speech capabilities?

81% chance

In what year will AI achieve a score of 95% or higher on the GPQA benchmark?

5/25/27

How much time will pass between an LLM being released that beats GPT4 and the next OpenAI LLM being released? (+ANSWERS)

Will OpenAI release another open source LLM before end of 2026?

79% chance

🏅 Top traders

People are also trading

People are also trading

Related questions