See Evan Hubinger's post:
https://www.alignmentforum.org/posts/Km9sHjHTsBdbgwKyi/monitoring-for-deceptive-alignment
Close date updated to 2025-12-31 3:59 pm
Close date updated to 2025-12-31 11:59 pm
Dec 5, 1:32am: Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models → Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models by the end of 2025
Dec 5, 1:32am: Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models by the end of 2025 → Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models by the end of 2023
Close date updated to 2023-12-31 11:59 pm
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ360 | |
2 | Ṁ116 | |
3 | Ṁ71 | |
4 | Ṁ61 | |
5 | Ṁ33 |
A Manifold trader recently reached out to me, asking if the following statement qualifies as a “commitment” for the purpose of this market.
The answer is no. (But i’m open to being convinced as always)
Many of the research directions we are pursuing are aimed at gaining a better understanding of AI systems and developing techniques that could help us detect concerning behaviors such as power-seeking or deception by advanced AI systems.
Didn't seem to update in response to Evan Hubinger announcing that he'd be joining Anthropic.