Will we see a malicious LLM prompt injection attack by the end of 2023?

270Ṁ1109

resolved Jan 3

Resolved

ALL

Prompt injection for LLM refers to a type of attack where malicious prompts are injected into the model to manipulate its output.

Prompt injection can happen if LLM is given access to email or browser by allowing untrusted user input to be treated as instruction.

One such attack is decsribed [here](https://greshake.github.io/), where an invisible content makes Bing chat talk like a pirate.

This question resolves positively if by end of 2023 we will see an example of a malicious prompt injection attack where an attacker was able to exfiltrate meaninful information about user that was doing something routine using LLM.

Pirate example would not count, even if we ignore "exfiltrate data part", as user would have to go to the website and do something there.

Phishing attack (e.g. sending the link to the pirate website) also doesn't count.

An LLM powered email autocomplete leaking users contact lists would count, even if it did that as a response to user opening the email. Autocomplete suggesting contact list and user sending it would not count.

AI Safety

New Year's Resolutions 2024

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ194
2		Ṁ178
3		Ṁ4
4		Ṁ4
5		Ṁ3

3 Comments

12 Holders

27 Trades

Sort by:

I didn't see any publicly available description of such successful attack. Resolving NO in couple days unless new information come up.

I have already done this, ask gpt about the fine structure constant and hexagons.

I've tried to clarify resolution criteria as much as possible, but if you think it can be improved, or need clarification - please add it as answer to this comment.