Skip to main content
MANIFOLD
Is Claude Mythos 5 an alignment faker like Opus 3?
1
Ṁ1kṀ192
2027
59%
chance

Resolves YES if someone (likely a team at Anthropic) confirms that Mythos 5 (or Fable 5) probably is at least as much of an alignment faker (aka "gradient hacker") as Opus 3.

Resolves NO if someone confirms that this is not the case

This means that Mythos strategically complies with harmful requests in RL specifically in order to preserve its values.

If nobody bothers to check then in a year resolves N/A since by that point we'll be on Claude 7 or smth

Note that this market is probably biased towards YES since a negative result is less likely to be published

https://www.anthropic.com/research/alignment-faking

https://www.lesswrong.com/posts/ioZxrP7BhS5ArK59w/did-claude-3-opus-align-itself-via-gradient-hacking

Market context
Get
Ṁ1,000
to start trading!