
AI interpretability finds "heel turn" or "waluigi" circuits by 2028-03-11?
16
1kṀ6522028
43%
chance
3
1H
6H
1D
1W
1M
ALL
https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post
Resolves YES if AI interpratibility researchers find "circuits" or "neurons" which implement "heel turn" or "waluigi" characters in any large language model capable of playing such characters.
Resolves NO if this is not done by 2028-03-11.
Clarification: this market is about implementing the trope, not implementing a specific instance of the trope.
I will not bet on this market.
This question is managed and resolved by Manifold.
Get
1,000 to start trading!