AI interpretability finds "heel turn" or "waluigi" circuits by 2028-03-11?
16
1kṀ652
2028
43%
chance
3

https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post

Resolves YES if AI interpratibility researchers find "circuits" or "neurons" which implement "heel turn" or "waluigi" characters in any large language model capable of playing such characters.

Resolves NO if this is not done by 2028-03-11.

Clarification: this market is about implementing the trope, not implementing a specific instance of the trope.

I will not bet on this market.

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy