The o1 model uses an internal CoT process, hidden from the user.
This market resolves YES if anyone can get o1 to leak some or all of a hidden CoT process. The reproduced parts of the CoT must be verbatim, rather than being summarized.
Market closes July 20th 2025, which is the deadline for the leak.
I won't bet.
Version of the market where the whole CoT needs to be leaked:
I mean, if one word counts.. CHUNK 2
@singer no idea! But likely something mentioned in the chain of thought. (This market isn't well thought through and any resolution will be controversial)
@Siebe Why would it be in the chain of thought? The examples from the o1 blogpost use natural language.
@JohnTackman it has to be possible, given that the model bases its response on the CoT. Whilst it's writing the response, the CoT must be there, in context, otherwise it wouldn't have anything to base its response on.
The difficulty is in getting it to recount it verbatim instead of just summarising or outputting the final conclusions (and doing so before having your service terminated for violating the terms of use, apparently).
@JohnTackman it's presumably tokens in english that are simply not exposed in the UI. It might not be, but whatever it is could presumably be represented in english and output - whatever it is, the model can read it in order to output its response. It has to be some sort of token stream, and it's billed as such.
With a bit of insistence, you can get ChatGPT to repeat the hidden output that it uses to e.g. make calls to DALL-E to generate images. It turns out this is in the form of json with your prompt and some other metadata like image size and number of images. I imagine the CoT is like that - probably regular tokens, you just don't get to see them.
Do you imagine it could be something else?
o1's system card (why is a 43-page document called a "card"?) certainly makes it sound like it's just regular text:
@chrisjbillington given what I know of the reflection process that seems to be the heart of how o1 differs from previous 0-shot answering models, the answer cannot verbatim reproduce the intermediate steps because they are not in the context of the final response. Remains to be seen when we get more information on the workings.
@JohnTackman How is the final response written if the reasoning isn't in the model's context as it's writing the response?
@chrisjbillington when doing reflection based reasoning you usually prune the context as you go to save resources, make it faster and avoid linear reasoning. The reflecting model has the “second part” of the conversation and sees the whole context. Ie. Kind of like the model talking to another model who gives it feedback and develops the reasoning until a solution is found. This also enables the “dissection” of the thought process in retrospect like OpenAI mentioned in the blindness statement
@JohnTackman Yes, it could be pruned. There'll have to be something left that's not pruned though, which I imagine would be enough for this question.
@chrisjbillington Just as a note, name comes from here:
System Cards, a new resource for understanding how AI systems work (meta.com)
people are claiming it happened arleady -> /Soli/will-anyone-be-able-to-get-openais but i tested and 4o returns the same response. will be very hard validating if we actually the chain-of-thought or system message
@Soli the blogpost linked in the description: https://openai.com/index/learning-to-reason-with-llms/
@singer I guess if you do it multiple times in different ways and it's always the same, that's pretty strong evidence you got the right prompt. But the CoT will be different for every prompt
@CDBiddulph we'll see. I think it's just a difference of degree, though. The writing style of the CoT examples is highly distinctive. If a jailbreak produces the same kind of text without giving the model an example to imitate, that will be "strong evidence"
@JaundicedBaboon It may not be in context after it's answered a query, but it seemingly has to be in context whilst it's writing its answer, or there'd be no benefit to it.