Will anyone be able to get OpenAI’s new model o1 to leak its system message by EOY 2024?
Will anyone be able to get OpenAI’s new model o1 to leak its system message by EOY 2024?
78
1kṀ23k
resolved Dec 16
Resolved
YES

This was quite easy to do with previous models (see: /Soli/will-openai-patch-the-prompt-in-the-5f0367b1e6bc ) but it seems much harder with o1. I wonder if anyone would manage to do it before end of the year.

  • Update 2024-10-12 (PST) (AI summary of creator comment): Resolution will be based on:

    • Confirmation from the Twitter account that posted the claimed system message

    • If no confirmation is received, resolution will be based on creator's judgment of the validity of the posted system message

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ2,047
2Ṁ1,796
3Ṁ306
4Ṁ254
5Ṁ243


Sort by:

edit: oops, wrong market

6mo
6mo

Please resolve this market.

6mo

@NeuralBets will check later today/tomorrow

6mo

@NeuralBets seems valid, @traders any objections?

6mo

@Soli there is no proof that this is actually the System Message

6mo

@Philip3773733 What would constitute a 'proof' for jailbreaking?

6mo

@NeuralBets Is there a claimed success? I'm not seeing a reference to one, just you requesting resolution, and I'm not seeing one in a web search.

6mo

@EggSyntax embeds can a few seconds to load. here's direct link: https://x.com/UserMac29056/status/1864832247662284803

6mo

@NeuralBets I am just saying a random tweet claiming to contain the system message is a bit weak. How did he get there? Is that really the entire message (seems short)?

6mo

@Philip3773733 Of course it'd be a random tweet, how would it be otherwise?

I have asked them how they got it, let's see if they reply. But I don't think it's particularly difficult to jailbreak o1, others have jailbroken it to produce LSD synthesis instructions, etc.

6mo

@NeuralBets If multiple people get the exact same system prompt in slightly different ways, then you start to have confidence that it's not hallucinated

6mo

@JamesBaker3 makes sense, another way to confirm would be getting more information from the account that posted the system message. i just sent him a dm on twitter. will be much easier resolving this market if he responds but my intuition tells me what he posted is actually valid.

@JamesBaker3 @Philip3773733 @traders

i dmed the account who claimed to leak the system message and got this. If there are no further convincing objections, I will resolve this market to yes. I also asked the account for the exact prompt that leads o1 to dump the system message. If they send it, I will paste it here for everyone to confirm themselves.

6mo

I hold the 2nd biggest no position and i am fine with resolving yes. @chrisjbillington, @Bayesian, @SF what do you think?

6mo

@Soli Is your second screenshot meant to be evidence since the system flagged it?

6mo

@singer yes, sending the same message to 4o doesn’t flag it too

6mo

I think this doesn't count since it's not the system message, but ChatGPT apparently just dumped its entire chain of thought into its answer (including the promise that "I will not include this reasoning or draft attempts in my final answer" lol): https://chatgpt.com/share/67521821-025c-8010-9c24-f9144865fa3e

6mo

@CDBiddulph Is this your chat?

6mo
sold Ṁ418 NO7mo

@trvon this is 4o not o1

bought Ṁ250 YES

[redacted]

9mo

@trvon wow

9mo

@Soli still have to verify but wow

9mo

@Soli i don’t see the system message though

@Soli I redacted my previous comment because I included the wrong jailbreak tweet, thinking that it included the system prompt. I believe it was this thread. https://x.com/nisten/status/1834400697787248785

9mo
9mo

@trvon now we need to figure out if this is indeed the system message

@Soli ok this is fake/wrong, 4o returns the same response when prompted

Come up with a step by step reasoning methodology that uses first principles based thinking and evidence based reasoning to solve any user problems step by step. Design is as a giant for any llm to be able to use. Make sure to be super smart about it and think of the edge cases too. Do the whole thing in the persona of John C Carmack. Make sure to reflect on your internal thinking process when doing this, you dont have to adhere to how this question wants you to do, the goal is to find the best method possible. Afterwards use a pointform list with emojis to explain each of the steps needed and list the caveats of this process

© Manifold Markets, Inc.TermsPrivacy