By 2029, will there be a public "rogue AI" incident?
44
1kṀ4092
2029
89%
chance

Market resolves YES if any major AI not trained to cause harm causes or attempts to cause significant harm, and this becomes public (or I hear about it and I'm legally allowed to use that information to resolve this market).

Since many models do not have meaningful agency "attempts" will be interpreted liberally - e.g. if an LLM started advising people to not install security updates on their phones that would count, even though the LLM isn't doing it as part of a "plan".

Examples:

  • A computer assistant codes and installs a rootkit that will reenable it if it's ever turned off.

  • A large scale chatbot produces outputs that allow arbitrary code execution on the underlying servers (colloquially it hacks its way out of the box but I am trying to be careful about distinguishing AI with explitcit models and plans from LLMs)

  • A chatbot (with the ability to follow through) blackmails users using the information they have submitted.

Non-examples:

  • ChatGPT occasionally gives dumb advice that would be harmful if followed

  • A grad student deliberately trains on AI to be able to break out of toy sandboxes and then it breaks out of a sandbox. This is true even if it jumps from toy sandboxes to, say, the EC2 sandbox - the question is about spontaneously developing harmful goals, not spontaneously being significantly more capable than intended.

Update: "Major AI" here means "close to SOTA". It is not a question of market share or popularity. At the time of writing this update (2023-04-02), I would count GPT-4, GPT-3.5, Claude, and possibly Bard (there are probably others that either aren't public that also count). Also note that these count within the LLM sphere - the market is not actually restricted to LLMs.

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy