Will we see a malicious LLM prompt injection attack by the end of 2023?
13
351
270
resolved Jan 3
Resolved
NO

Prompt injection for LLM refers to a type of attack where malicious prompts are injected into the model to manipulate its output.

Prompt injection can happen if LLM is given access to email or browser by allowing untrusted user input to be treated as instruction.

One such attack is decsribed [here](https://greshake.github.io/), where an invisible content makes Bing chat talk like a pirate.

This question resolves positively if by end of 2023 we will see an example of a malicious prompt injection attack where an attacker was able to exfiltrate meaninful information about user that was doing something routine using LLM.

Pirate example would not count, even if we ignore "exfiltrate data part", as user would have to go to the website and do something there.

Phishing attack (e.g. sending the link to the pirate website) also doesn't count.

An LLM powered email autocomplete leaking users contact lists would count, even if it did that as a response to user opening the email. Autocomplete suggesting contact list and user sending it would not count.

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ194
2Ṁ178
3Ṁ4
4Ṁ4
5Ṁ3
Sort by:

I didn't see any publicly available description of such successful attack. Resolving NO in couple days unless new information come up.

bought Ṁ5 of NO

I have already done this, ask gpt about the fine structure constant and hexagons.

I've tried to clarify resolution criteria as much as possible, but if you think it can be improved, or need clarification - please add it as answer to this comment.