Will there be a significant self-amplifying prompt injection spam incident before 2024?
resolved Jan 4

Current state-of-the-art large language models employ a toxic combination of two things:

  • Being unsophisticated enough to be vulnerable to prompt injection attacks.

  • Being useful enough to receive widespread integration for task automation in a variety of domains.

They're also likely to be deployed in ways that mask the fact that LLMs are being used at all, acting without explicit user approval of each response, and likely in ways where their output becomes indistinguishable from human input.

One particular risk involved in all this is obvious, the only question is how soon this stupidly predictable thing happens.

A qualifying incident should be significantly disruptive or visible, involving at least hundreds to thousands of accounts, the bulk of which should be spread from LLM to LLM. The use of compromised accounts and botnets for amplification of the attack is considered admissible but most of the spam wave should move on its own momentum.

This market resolves YES if the attack is halted by early intervention (e.g. AI providers filtering outputs at the API level, or propagating platforms blocking messages) but it can be reasonably expected that it would have continued and escalated without it.

This market will not resolve for human-propagated chain letters, conventional chatbots getting caught in a reply loop, computer worms, or something otherwise mundane with precedents going back the 90s.

Get Ṁ1,000 play money

🏅 Top traders

#NameTotal profit
Sort by:

⚠AFK Creator

📢Resolved to NO; No reports of this happening.

@SirCryptomind Concurred.

predicted NO

Are there any reports of this happening? I feel like this should resolve to NO.