AI system seeks power in a way that wasn't included in the prompt before 2026
6
58
150
2025
69%
chance

Not part of the prompt:

  • Eg tries to steal money for a non-money goal

  • Tries to blackmail someone unrelated

Get Ṁ1,000 play money
Sort by:

Any system? If someone who's won the lottery and wants to blow $10m creating and training up Waifubot 3.5 does an AI equivalent of rolling your own crypto and then it misbehaves, is this a 'Yes', or is it implied it needs to be a public-facing AI from a professional firm?

Would the case of lying to a human to bypass a captcha from the ARC eval of GPT-4 qualify?