Will there be a significant self-amplifying prompt injection spam incident before 2024?

210Ṁ2121

resolved Jan 4

Resolved

ALL

Current state-of-the-art large language models employ a toxic combination of two things:

Being unsophisticated enough to be vulnerable to prompt injection attacks.
Being useful enough to receive widespread integration for task automation in a variety of domains.

They're also likely to be deployed in ways that mask the fact that LLMs are being used at all, acting without explicit user approval of each response, and likely in ways where their output becomes indistinguishable from human input.

One particular risk involved in all this is obvious, the only question is how soon this stupidly predictable thing happens.

A qualifying incident should be significantly disruptive or visible, involving at least hundreds to thousands of accounts, the bulk of which should be spread from LLM to LLM. The use of compromised accounts and botnets for amplification of the attack is considered admissible but most of the spam wave should move on its own momentum.

This market resolves YES if the attack is halted by early intervention (e.g. AI providers filtering outputs at the API level, or propagating platforms blocking messages) but it can be reasonably expected that it would have continued and escalated without it.

This market will not resolve for human-propagated chain letters, conventional chatbots getting caught in a reply loop, computer worms, or something otherwise mundane with precedents going back the 90s.

New Year's Resolutions 2024

Get

1,000

to start trading!