Will a Sleeper Agent Attack on a major LLM (ChatGPT, Gemini, etc.) cause significant real-life consequences in 2024?
38
203
770
Dec 31
5%
chance

This question was inspired by Karpathy’s tweet.


This market seeks to predict if a sleeper agent attack, distinct from prompt injection or hacking, will occur on a major language model in 2024, resulting in significant real-life consequences. A sleeper agent attack refers to a situation where hidden, malicious functionalities in a language model are activated under specific conditions, due to deliberate manipulations in its training.

Resolution Criteria

The market will resolve to ‘Yes’ if, by December 31, 2024, the following criteria are met:

  1. Nature of the Attack: A major language model (such as GPT, BERT, etc.) is targeted by a sleeper agent attack, not related to conventional cyber threats like prompt injection or hacking. The attack is characterized by the model performing unauthorized actions due to specific triggers embedded in its training data or design.

  2. Public Recognition and Reporting: The incident is publicly reported and verified by credible sources such as major news outlets, cybersecurity authorities, or the company owning the language model.

  3. Significant Real-Life Impact: The attack results in notable real-life consequences. These may include, but are not limited to, substantial data breaches, significant misinformation spread, tangible financial losses, or other serious impacts on individuals, organizations, or societal functions.

  4. Verification and Documentation: There is comprehensive, publicly available evidence or reporting that confirms the specifics of the attack, its unique sleeper agent nature, and the consequent significant impacts.

The market will resolve to ‘No’ if no incident meeting all these criteria occurs by the end of 2024.

Additional Notes

  • The significance and real-life impact of the attack are key factors for this prediction market. Mere technical anomalies or minor incidents without substantial real-life effects do not qualify.

  • The sleeper agent attack must be distinct in its mechanism and effects from other types of cybersecurity threats.

Get Ṁ600 play money
Sort by:

“credible sources such as major news outlets” I wish this was more true. But I know what you mean, I’m just being sassy tonight.

reposted

I don’t think this will be a popular question since it requires some understanding of how LLMs work but I still think it is an important one

bought Ṁ10 of NO

@Soli you guys understand markets before you bet on them??

bought Ṁ500 NO from 14% to 9%

More related questions