10. The first real AI safety incident will occur.

1kṀ3650

Jan 3

28%

chance

ALL

All these predictions are taken from Forbes/Rob Toews' "10 AI Predictions For 2025".
For the 2024 predictions you can find them here, and their resolution here.
You can find all the markets under the tag [2025 Forbes AI predictions].
Note that I will resolve to whatever Forbes/Rob Toews say in their resolution article for 2025's predictions, even if I or others disagree with his decision.
I might bet in this market, as I have no power over the resolution.

Description of this prediction from the article:
As artificial intelligence has become more powerful in recent years, concerns have grown that AI systems might begin to act in ways that are misaligned with human interests and that humans might lose control of these systems. Imagine, for instance, an AI system that learns to deceive or manipulate humans in pursuit of its own goals, even when those goals cause harm to humans.

This general set of concerns is often categorized under the umbrella term “AI safety.”

(AI creates plenty of other societal challenges, from facilitating surveillance to perpetuating bias, but topics like these are distinct from the field of AI safety, which more specifically concerns itself with the risk that AI systems will begin to behave in misaligned ways that are outside of human control, perhaps even eventually posing an existential threat to humanity.)

In recent years, AI safety has moved from a fringe, quasi-sci-fi topic to a mainstream field of activity. Every major AI player today, from Google to Microsoft to OpenAI, devotes real resources to AI safety efforts. AI icons like Geoff Hinton, Yoshua Bengio and Elon Musk have become vocal about AI safety risks.

Yet to this point, AI safety concerns remain entirely theoretical. No actual AI safety incident has ever occurred in the real world (at least none that has been publicly reported).

2025 will be the year that this changes.

What should we expect this first AI safety event to look like?

To be clear, it will not entail Terminator-style killer robots. It most likely will not involve harm of any kind to any humans.

Perhaps an AI model might attempt to covertly create copies of itself on another server in order to preserve itself (known as self-exfiltration). Perhaps an AI model might conclude that, in order to best advance whatever goals it has been given, it needs to conceal the true extent of its capabilities from humans, purposely sandbagging performance evaluations in order to evade stricter scrutiny.

These examples are not far-fetched. Apollo Research published important experiments earlier this month demonstrating that, when prompted in certain ways, today’s frontier models are capable of engaging in just such deceptive behavior. Along similar lines, recent research from Anthropic showed that LLMs have the troubling ability to “fake alignment.”

Transcripts from Apollo Research's experiments with frontier LLMs, demonstrating these models' ... [+]

In all likelihood, this first AI safety incident will be detected and neutralized before any real harm is done. But it will be an eye-opening moment for the AI community and for society at large.

It will make one thing clear: well before humanity faces an existential threat from all-powerful AI, we will need to come to terms with the more mundane reality that we now share our world with another form of intelligence that may at times be willful, unpredictable and deceptive—just like us.

Technology

Technical AI Timelines

AI Safety

Forbes - 2025 AI Predictions

Get

1,000

to start trading!

2 Comments

18 Holders

68 Trades

Sort by:

I would gladly bet this down to 20% if we didn't rely on author's own assessment and had more strict criteria for what counts as an "AI safety incident". Last time I found a lot of their resolutions questionable.

Edit: I made it myself, this is my best effort at making an unambigous market about AI safety incidents:

@ProjectVictory You can argue that a Tesla driving itself into a truck was an AI safety incident - it's an autonomous system that is using machine learning to perform a human task, and it malfunctioned causing property damage and loss of life. Except it already happened.