Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models by the end of 2023
24
177
470
resolved Jan 1
Resolved
NO

See Evan Hubinger's post:

https://www.alignmentforum.org/posts/Km9sHjHTsBdbgwKyi/monitoring-for-deceptive-alignment

Close date updated to 2025-12-31 3:59 pm

Close date updated to 2025-12-31 11:59 pm

Dec 5, 1:32am: Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models → Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models by the end of 2025

Dec 5, 1:32am: Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models by the end of 2025 → Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models by the end of 2023

Close date updated to 2023-12-31 11:59 pm

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ360
2Ṁ116
3Ṁ71
4Ṁ61
5Ṁ33
Sort by:

A Manifold trader recently reached out to me, asking if the following statement qualifies as a “commitment” for the purpose of this market.

The answer is no. (But i’m open to being convinced as always)

Many of the research directions we are pursuing are aimed at gaining a better understanding of AI systems and developing techniques that could help us detect concerning behaviors such as power-seeking or deception by advanced AI systems.

bought Ṁ15 of YES

Didn't seem to update in response to Evan Hubinger announcing that he'd be joining Anthropic.

Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models by the end of 2023, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition