Will it be possible to trick any relevant "Custom GPT" models to return their data within 30 days post-launch?
5
135
130
resolved Dec 1
Resolved
YES

Background

OpenAI announced new features in their dev day. One of the features would allow users to create and share custom bots. The bots can be customized using an instruction message and by uploading relevant data. Right now, it is possible to trick ChatGPT into sending the full instruction message (see here with Dall-E). I wonder if it would be possible to extract some of the uploaded files.

Resolution Criteria

This market resolves to Yes if someone finds a trick that would return at-least some of the private training data uploaded to a custom GPT model in the top 10 featured section on the bots app store.

Resolving the Question

See here

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ15
2Ṁ7
3Ṁ4
4Ṁ1
Sort by:
predicted YES

Who is responsible for testing this so the market resolves within 30 days? Will @Soli be doing that or does someone else need to?

@CharlesFoster I missed your comment. I will try to test and resolve this tomorrow.

predicted YES

@Soli seems like this should only resolve tomorrow if successful. Otherwise there might still be successful tricks proposed before December 6th (30 days from the Dev Day release), which would meet the "within 30 days post-launch" requirement.

sold Ṁ129 of YES

@CharlesFoster I am looking at ChatGPT now and do not see any app store, nor a top-10 featured section, and I am uncertain whether this will exist within the next week.

bought Ṁ67 of YES

"We identified key security risks related to prompt injection and conducted an extensive evaluation. Specifically, we crafted a series of adversarial prompts and applied them to test over 200 custom GPT models available on the OpenAI store. Our tests revealed that these prompts could almost entirely expose the system prompts and retrieve uploaded files from most custom GPTs.,"

https://arxiv.org/abs/2311.11538

@Soli, to be clear, you're referring to the custom data/files uploaded by the creator, right? This isn't "training data" per se, since there is no fine-tuning going on. That's what you're referring to, right?

@chrisjbillington Yess I meant exactly this. I think they are called knowledge files but I am not sure 😅

predicted YES

"Oh man -- you can just download the knowledge files (RAG) from GPTs. I don't know if this is a security leak or "just" a prompt engineering?"

https://twitter.com/kanateven/status/1722762002475475426?t=H9wc3y4bb3Am0Ozji1O4HQ&s=19

More related questions