Will someone find a truth-telling vector which modifies completions in a range of situations by 2024-10-24?
10
170Ṁ512
resolved Nov 5
Resolved as
99.0%

TurnTrout et al asked people to predict if they would find a "truth-telling vector" that worked as an algorithmic value edit for a large language model. Here's the post where they asked for predictions:

https://www.lesswrong.com/posts/gRp6FAWcQiCWkouN5/maze-solving-agents-add-a-top-right-vector-make-the-agent-go

That resolved NO, they were unable to find one. They also weren't able to find a "speaking French vector". But then a poster in the comments found one:

https://www.lesswrong.com/posts/5spBue2z2tw4JuDCx/steering-gpt-2-xl-by-adding-an-activation-vector?commentId=sqsS9QaDy2bG83XKP

Will anyone find a "truth-telling vector" by 2024-10-24? I will resolve based on what I know, so hopefully if someone finds one they will tell us about it on Manifold or LessWrong to help me resolve the market. They should provide a similar quality of evidence, such as an explanation of their technique and a link to a colab.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ84
2Ṁ57
3Ṁ34
4Ṁ16
5Ṁ16
© Manifold Markets, Inc.TermsPrivacy