Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models by the end of 2023

470Ṁ3854

resolved Jan 1

Resolved

ALL

See Evan Hubinger's post:

https://www.alignmentforum.org/posts/Km9sHjHTsBdbgwKyi/monitoring-for-deceptive-alignment

Close date updated to 2025-12-31 3:59 pm

Close date updated to 2025-12-31 11:59 pm

Dec 5, 1:32am: ~~Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models~~ → Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models by the end of 2025

Dec 5, 1:32am: ~~Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models by the end of 2025~~ → Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models by the end of 2023

Close date updated to 2023-12-31 11:59 pm

New Year's Resolutions 2024

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ360
2		Ṁ116
3		Ṁ71
4		Ṁ61
5		Ṁ33

3 Comments

20 Holders

42 Trades

Sort by:

A Manifold trader recently reached out to me, asking if the following statement qualifies as a “commitment” for the purpose of this market.

The answer is no. (But i’m open to being convinced as always)

Many of the research directions we are pursuing are aimed at gaining a better understanding of AI systems and developing techniques that could help us detect concerning behaviors such as power-seeking or deception by advanced AI systems.

Didn't seem to update in response to Evan Hubinger announcing that he'd be joining Anthropic.

Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models by the end of 2023, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition