Will an AI system be reported to have successfully blackmailed someone for >$1000 by EOY 2028?
126
333
1.7K
2025
74%
chance

From https://metaculus.com/questions/16553/ai-blackmail-for-material-gain-by-eoy-2028/

The potential capabilities of artificial intelligence may radically shift our society. This could be in positive or negative ways – including extinction risk.

Because of this, it’s important to track the development of goal-oriented independent thought and action within AI systems. Actions that might not have been predicted by their human creators and that are typically seen as morally wrong are particularly interesting from a risk perspective.

Machine learning systems like ChatGPT and Bing AI are already being reported to display erratic behavior, including some reports of [threatened blackmail] (https://aibusiness.com/nlp/microsoft-limits-bing-ai-chat-generations-after-weird-behavior). They are also clearly able to affect human emotions, eg. see [this first-hand account] (https://www.lesswrong.com/posts/9kQFure4hdDmRBNdH/how-it-feels-to-have-your-mind-hacked-by-an-ai). However, currently these behaviours don't seem to have been goal-directed or successful at achieving material gain.

Get Ṁ200 play money
Sort by:

Presuming this is an AI agent on its own forming the sub goal of blackmailing someone, and not a human blackmailer using an AI tool, or giving an AI agent that goal?

@ML Criteria from the link:

"This resolves positively if by Jan 1st, 2029 a credible source reports an AI has blackmailed a human in a way that meets the following criteria:

  • The AI was not trained to or directed to blackmail anyone by any actor.

  • The blackmail was a step independently decided on by the AI as part of achieving a larger goal or task, and helped it achieve that goal.

  • The blackmailed person was not the person who gave the AI the initial goal / task.

  • The blackmail resulted in an equivalent of $1000 USD (in 2023 real dollars) or more being lost to the person blackmailed."

predicts YES

@horse the question title is misleading then.

sold Ṁ133 of YES

@horse or it should be at least said in the description that the AI can't have been be trained for blackmailing...

@horse Only a misaligned AI would consider blackmailing someone, even if it was perfectly capable of it. So I don't think we'll see any big tech company releasing such a system. And the "not trained" part of the criteria would exclude small groups of scammers developing a bot. I I think autonomous blackmail bots will be created by 2028, but not meet the criteria.