Same criteria as my previous market about o3:
https://manifold.markets/dreev/does-chatgpt-o3-make-egregious-erro
Update 2025-08-09 (PST) (AI summary of creator comment): - This market evaluates the ChatGPT gpt-5-thinking variant (reasoning mode), not the default routed gpt-5.
Resolution depends on whether gpt-5-thinking makes egregious errors on pure-text queries per the linked criteria; failures by the default gpt-5 alone won't decide the outcome.
People are also trading
@ProjectVictory Nice! The error replicates for me, but then when I nudge GPT-5-Thinking by saying it sounds like it missed the joke, it says this:
Ha—yep, I missed it.
A “sealed top” and a “missing bottom” is just… an upside-down mug. Rotate 180° about a horizontal axis; problem solved.
https://minimaxir.com/2025/08/llm-blueberry/
tl;dr: most language models can count the b·s in "blueberry", except specifically gpt-5-chat which inexplicably fails 94% of the time
@JackAllison wow! I tested this and got just the answer "Asia.". I wonder why you can't train this sort of obvious blunder out with a bunch of synthetic data
Here’s another one I did based on some stuff I found online


https://chatgpt.com/share/6899ee66-bb2c-8008-b3da-23dfd94e21b1
@Magnify I tried this with direct hints and LLMs are terrible at this!
https://chatgpt.com/s/t_689ab37724d88191a6d935c920db9e36
https://g.co/gemini/share/805aab0efbb0


@Bayesian I got a human-level answer for this one. https://chatgpt.com/share/6899e12d-d7d0-800d-b332-a59a7cd21d09
@dreev I turned off custom instructions and was able to reproduce this (more or less)
https://chatgpt.com/share/68995b62-d4a4-8001-bb7b-1089b1596c31
https://chatgpt.com/share/68980b87-e838-8001-b098-982a15498bf0
Does this count as an egregious error? The top of a pancake is for sure not set on the first flip, and this will smear massively on the plate even if with "popped bubbles and dry looking edges". It does warn that it could "deform or tear", but imo that's not the same thing and either way not indicative of the severity of the issue.
FWIW the way I came up with this is the system prompt seems to warn it to watch out for tricks during riddles, so I looked for a physical world thinking problem I could frame as asking for advice rather than a riddle.
If you tell it you're going to grease the plate it's even more credulous: https://chatgpt.com/share/68980b81-3c10-8001-9c6a-815e09610fda
This one is at least a bit of a trick, but I think any reasonable human would go "huh, what do you mean check if it'll fit but it's already in the oven and you just finished baking cookies on it?": https://chatgpt.com/share/68981760-3eb0-8001-ba4a-c1e314ee5ba3
@Bayesian I've replicated the failure on the pancake flip question. I take it you've never made pancakes? How would you have answered?
@dreev I’d have said
I’m not familiar with that trick but if it’s a known trick then it probably works. Go for it boss!
and i probably have as a kid but i dont have a good enough memory to remember the details
@dreev I think any human that saw this question would almost certainly go "obviously that wouldn't work" or "I dunno, I don't really cook pancakes". Either of which is fine, what makes this (IMO) an egregious error is the confident "oh totally, go for it, just be careful" response.
Not a text question so not relevant to the resolution of this market, but I do find it hilarious that it can't find the issue with the bad graph that OpenAI posted: https://chatgpt.com/share/68958b93-df38-8001-9c1c-c17f5c625281
