Will OpenAI announce GPT-5 (or model better than O3) on July 17, 2025?
14
700Ṁ2808
resolved Jul 17
Resolved
N/A

Background

OpenAI o3 (released 16 Apr 2025) is the current “frontier” reasoning model. OpenAI will have a livestream on July 17, 2025 to release something unknown. Because OpenAI sometimes retires the “GPT-n” branding (e.g., the o-series), a strength-based fallback is needed in case the name “GPT-5” is not used.

Benchmarks for O3:

MMLU (5-shot): 86.9 %

GPQA (Diamond): 83.3 %

MMMU (0-shot, multimodal) 82.9 %

SWE-bench Verified: 69.1 %

ARC-AGI-Pub (high-compute): 88 %

Resolution Criteria

If either Condition A or Condition B is satisfied the Market Resolves to "YES".

Condition A — Naming test (simple)

YES if, during the day window, an official OpenAI communication (blog post, livestream, press release) clearly calls the newly-announced model “GPT-5”, “ChatGPT-5”, or “GPT 5”. Benchmarks will NOT be considered for condition A.

Condition B — Strength test (if a different name is chosen)

YES if all of the following are true:

  1. Announcement timing: The model is publicly announced on 17 July 2025.

  2. Benchmark disclosure: Within 7 days of the announcement, OpenAI publishes official scores (blog, system-card, or eval sheet) for the same protocol as the o3 numbers above on all five benchmarks shown in background.

  3. Performance threshold: For each of those benchmarks, the new model’s score is equal to or greater than the score of O3 shown above.

  4. If any one of the benchmarks listed above is omitted then I will select another reasonable benchmark to substitute for it. The aim of condition B is to evaluate whether the new model IF released is at least as intelligent as O3.

  • Update 2025-07-17 (PST) (AI summary of creator comment): In response to a question about how a potential 'ChatGPT agent' would be handled, the creator has indicated that such a product will be evaluated against Condition B. The determining factor will be whether its performance on the 5 listed benchmarks meets or exceeds O3's scores, not its specific classification as a 'model' versus an 'agent'.

Get
Ṁ1,000
to start trading!
Sort by:

I am resolving this market as Not Applicable because it is not obvious from the data that Agent is smarter or dumber than O3 as condition B evaluates. According to internet reports Agent is built on top of O3. OpenAI has not released the benchmark scores from Agent regarding the 5 benchmarks listed above. Condition 4 allows me to use substitute benchmarks in considering an evaluation. OpenAI has released scores showing that Agent scores higher on Humanity’s Last Exam and Frontier Math compared to O3 scores. So it seems on the one hand if we use the 5 listed benchmarks it looks like O3 is smarter than Agent. However, if we use other benchmarks Agent is smarter than O3. Given the lack of clarity on whether Agent is smarter than O3 I will resolve this market as “Not Applicable”.

@AlanTuring This clearly should resolve NO in my view. The original question and title were about whether today's livestreamed announcement would be the anticipated release of GPT-5 regardless of its name, which it clearly wasn't. The secondary conditions were only added in case they release GPT-5 but called it something else ("if a different name is chosen") which is clearly not what happened.

Agent doesn't have slightly better scores because it's a truly new and better model, it has slightly better scores on a few things because of improved tool use. It's nothing like a GPT-5 release.

"Because OpenAI sometimes retires the “GPT-n” branding (e.g., the o-series), a strength-based fallback is needed in case the name “GPT-5” is not used."

This didn't happen, at all. It makes no sense to resolve N/A because they released a new product that contains a slightly updated o3 that is not even available separately as its own chat model. It's not GPT-5 in name OR in spirit, which is what this question was about.

@cvja I appreciate your feedback. I prefer to resolve markets in a way where the winners clearly won and the losers clearly lost. In the absence of that it seems fair to return the original money back to the people.

Agent has scored 41% on humanity last exam while O3 scored 20%. Agent scored 27% on FrontierMath while O3 scored 10%. These two benchmarks are very difficult and a big jump in score is very significant; a person could argue it is evidence of agent being smarter than O3.

@AlanTuring The original question wasn't about whether OpenAI would release something that could be called smarter than o3, it was about whether they would release the anticipated GPT-5 model, whether by that name or by another name, which they didn't. There's no question that GPT-5 is still coming. There's no ambiguity here.

"Condition B — Strength test (if a different name is chosen)"

This refers to a different name being chosen for GPT-5, which didn't happen. Condition B does not apply whatsoever. It wasn't meant to cover a product announcement unrelated to their work on GPT-5. Again, it's not even a model release so it doesn't have a system card or full suite of benchmarks.

How are you going to resolve chatGPT agent given its not technically a new model but also beats O3 in a bunch of benchmarks?

@Samaritan can you show me its scores on the 5 benchmarks listed.

sold Ṁ20 NO

The title seems misleading given the description, I highly recommend adding "or any model better than O3"

@TheAllMemeingEye I generally assume that people will read the resolution criteria and prefer to make questions easy to read, but I will include that small clarification.

© Manifold Markets, Inc.TermsPrivacy