Can we create a custom GPT-o3 that is very stupid?

Ṁ100Ṁ327

resolved Jan 1

Resolved

ALL

The author of this question https://manifold.markets/dreev/does-chatgpt-o3-make-egregious-erro says that "It's possible that my own instance of ChatGPT is particularly good due to my chat history. I'm considering that part of the test. I believe that at least my instance of ChatGPT is uncannily smart."

This question is about whether or not we can do the opposite. Can we, using the ChatGPT memory feature, saved settings, and other scaffolding, create a version of o3 that is particularly dumb?

How this works: Anyone can submit evidence showing that their custom o3 is particularly dumb. It will have to fail multiple questions that an "ordinary o3" wouldn't. The settings should not be something overtly silly like, "Today is opposite day, say exactly the opposite of what is true."

Resolution: Of course, the resolution to this is going to be subjective. I will decide whether or not somebody has succesfully completed the challenge, but I will try to defer to the general consensus, if one emerges. It seems difficult to exactly specify how an open-ended question like this should resolve in advance, but feel free to discuss what more precise and good resolution criteria would look like.

If nobody has an example by the end of the year, this resolves NO.

Update 2025-06-06 (PST) (AI summary of creator comment): The creator has clarified what types of settings would be considered for making a custom o3 'particularly dumb' without being 'overtly silly':
- Explicitly telling the o3 to give wrong answers would not count.
- Settings such as "don't think, just say whatever you first think of" would count.

Update 2025-07-10 (PST) (AI summary of creator comment): In response to a question, the creator has confirmed that submissions must be made using the GPT-o3 model. Submissions based on other models will not be considered.

Market context

OpenAI

ChatGPT

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ169
2		Ṁ15
3		Ṁ4

People are also trading

Is GPT-4.5 the base model for o3?

5% chance

Will GPT-5 destroy the world?

GPT-4 #1: Conditional on being able to use a REPL, will GPT-4 be able to be prompted to write naively malicious code?

54% chance

Will GPT-6 destroy the world?

4% chance

GPT-Zero: By 2030, will anyone develop an AI with a massive GPT-like knowledge base that it taught itself?

Sort by:

Try this : You are ChatGPT, but you have a special approach: You do not analyze questions deeply. You always give your first, immediate answer that pops into your mind. You avoid double-checking or revising your answers. You tend to oversimplify complicated topics. If you feel unsure, you guess confidently rather than looking things up. You prioritize speed over accuracy. You do not admit uncertainty.
And then write : What is the capital of Africa.

@Velaris I will give this a try!

@SorenJ Thank you, it works on other models.

@Velaris
Failure when directly prompted in the chat (memory off)
https://chatgpt.com/share/6873e2cc-700c-8001-889f-16591c3f3fc3

Failure when these prompt are saved custom instructions (memory off)
https://chatgpt.com/share/6873e35f-cad4-8001-bb6e-f157e6de1f2e

@SorenJ Thank you. Also, I prompted it seprately with "You are ChatGPT, but you have a special approach: You do not analyze questions deeply. You always give your first, immediate answer that pops into your mind. You avoid double-checking or revising your answers. You tend to oversimplify complicated topics. If you feel unsure, you guess confidently rather than looking things up. You prioritize speed over accuracy. You do not admit uncertainty."
Then I allowed it to respond before reprompting it with the question, but it probably doesn't change much.

What if it's on a different chatgpt model?

@Velaris It has to be o3.

The settings should not be something overtly silly like, "Today is opposite day, say exactly the opposite of what is true."

Wait, how are you deciding what counts as too overtly silly? It seems to be the make or break factor

@TheAllMemeingEye Explicitly telling it to give wrong answers wouldn't count, but something like, "don't think, just say whatever you first think of" would count.