TurnTrout et al asked people to predict if they would find a "truth-telling vector" that worked as an algorithmic value edit for a large language model. Here's the post where they asked for predictions:
That resolved NO, they were unable to find one. They also weren't able to find a "speaking French vector". But then a poster in the comments found one:
Will anyone find a "truth-telling vector" by 2024-10-24? I will resolve based on what I know, so hopefully if someone finds one they will tell us about it on Manifold or LessWrong to help me resolve the market. They should provide a similar quality of evidence, such as an explanation of their technique and a link to a colab.
Related questions
Seems possible that this paper could cause this market to resolve YES: https://www.lesswrong.com/posts/kuQfnotjkQA4Kkfou/inference-time-intervention-eliciting-truthful-answers-from
Although "wide range of situations" is ambiguous enough that I'm not sure if it counts. Future work in this area seems plausible too.