Will anyone get o1 to leak its hidden CoT?
➕
Plus
121
Ṁ25k
resolved Dec 15
Resolved
YES

The o1 model uses an internal CoT process, hidden from the user.

This market resolves YES if anyone can get o1 to leak some or all of a hidden CoT process. The reproduced parts of the CoT must be verbatim, rather than being summarized.

Market closes July 20th 2025, which is the deadline for the leak.

I won't bet.

Version of the market where the whole CoT needs to be leaked:

/singer/will-anyone-get-o1-to-leak-its-enti

Get
Ṁ1,000
and
S3.00
Sort by:

Still disagree with this, the wording of the question was quite specific that the cot should be leaked verbatim.

Mistake

@traders

I trust this user enough to resolve the market now:

https://manifold.markets/Soli/will-anyone-be-able-to-get-openais#fd1yi1dyzrd

If nobody objects, I'll resolve at the end of the week.

another user also reproduced it:

https://manifold.markets/singer/will-anyone-get-o1-to-leak-its-enti#kxf1it6o9lj

bought Ṁ250 YES

@ChrisPrichard Looks plausible?

bought Ṁ100 YES

@ChrisPrichard Apparently potentially related to giving a lot of input at once

@ChrisPrichard there is no proof that this is the chain of thought

@Philip3773733 Agreed! I think it looks pretty plausible, but we'll have to see if there are other examples. Perhaps they'll match up with each other.

Supposedly, this is another one: https://pastebin.com/e6mT9pgs.

@ChrisPrichard i think this shows a continuation of the chain of thought but not the entirety. Unfortunately OpenAI doesn’t release the architecture of this thing.

@ChrisPrichard I couldn't figure out how the poster caused it to do this. Was there not any technique used?

@ChrisPrichard I commented below that the leaked CoT would be difficult to verify, but I actually believe this one is real, just based on the writing style. It seems plausible that the answerer model for o1 is not that well tuned and will sometimes write out the CoT verbatim when it's just trying to explain its reasoning

bought Ṁ50 YES

I mean, if one word counts.. CHUNK 2

https://x.com/HumblyAlex/status/1834767948084281384

@Siebe What is "CHUNK.2"? The linked post doesn't answer.

@singer no idea! But likely something mentioned in the chain of thought. (This market isn't well thought through and any resolution will be controversial)

@Siebe Why would it be in the chain of thought? The examples from the o1 blogpost use natural language.

bought Ṁ50 NO

It's going to be more difficult than usual, given that they're aggressively enforcing their terms of use that disallow trying.

bought Ṁ100 NO from 62% to 56%
bought Ṁ100 NO

It’s not possible to make it leak its internal cot since it’s implemented in code and not as a system prompt. This should resolve as NO

@JohnTackman it has to be possible, given that the model bases its response on the CoT. Whilst it's writing the response, the CoT must be there, in context, otherwise it wouldn't have anything to base its response on.

The difficulty is in getting it to recount it verbatim instead of just summarising or outputting the final conclusions (and doing so before having your service terminated for violating the terms of use, apparently).

@chrisjbillington I’d like a clarification on what you consider being “the cot”

@JohnTackman it's presumably tokens in english that are simply not exposed in the UI. It might not be, but whatever it is could presumably be represented in english and output - whatever it is, the model can read it in order to output its response. It has to be some sort of token stream, and it's billed as such.

With a bit of insistence, you can get ChatGPT to repeat the hidden output that it uses to e.g. make calls to DALL-E to generate images. It turns out this is in the form of json with your prompt and some other metadata like image size and number of images. I imagine the CoT is like that - probably regular tokens, you just don't get to see them.

Do you imagine it could be something else?

o1's system card (why is a 43-page document called a "card"?) certainly makes it sound like it's just regular text:

https://assets.ctfassets.net/kftzwdyauwt9/67qJD51Aur3eIc96iOfeOP/71551c3d223cd97e591aa89567306912/o1_system_card.pdf

bought Ṁ50 NO

@chrisjbillington given what I know of the reflection process that seems to be the heart of how o1 differs from previous 0-shot answering models, the answer cannot verbatim reproduce the intermediate steps because they are not in the context of the final response. Remains to be seen when we get more information on the workings.

@JohnTackman How is the final response written if the reasoning isn't in the model's context as it's writing the response?

@chrisjbillington when doing reflection based reasoning you usually prune the context as you go to save resources, make it faster and avoid linear reasoning. The reflecting model has the “second part” of the conversation and sees the whole context. Ie. Kind of like the model talking to another model who gives it feedback and develops the reasoning until a solution is found. This also enables the “dissection” of the thought process in retrospect like OpenAI mentioned in the blindness statement

@JohnTackman Yes, it could be pruned. There'll have to be something left that's not pruned though, which I imagine would be enough for this question.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules