
See Evan Hubinger's post:
https://www.alignmentforum.org/posts/Km9sHjHTsBdbgwKyi/monitoring-for-deceptive-alignment
Close date updated to 2025-12-31 3:59 pm
Close date updated to 2025-12-31 11:59 pm
Dec 5, 1:32am: Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models → Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models by the end of 2025
Dec 5, 1:32am: Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models by the end of 2025 → Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models by the end of 2023
Close date updated to 2023-12-31 11:59 pm
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ360 | |
2 | Ṁ116 | |
3 | Ṁ71 | |
4 | Ṁ61 | |
5 | Ṁ33 |
A Manifold trader recently reached out to me, asking if the following statement qualifies as a “commitment” for the purpose of this market.
The answer is no. (But i’m open to being convinced as always)
Many of the research directions we are pursuing are aimed at gaining a better understanding of AI systems and developing techniques that could help us detect concerning behaviors such as power-seeking or deception by advanced AI systems.
Anthropic publicly commits to actively monitor and look for evidence of deceptive alignment in their models by the end of 2023, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition
