
Will a >100 karma LW post claim "How to Catch a Liar" suffers from similar problems to CCS?
5
Ṁ130Ṁ1442032
31%
chance
1H
6H
1D
1W
1M
ALL
A recent post by the DeepMind alignment team argues that Contrast-Consistent Search struggles to find a feature that represents "knowledge" among many possible proxy features in a model.
How to Catch an AI Liar uses blackbox methods to try and tell if a model is lying. I want to know if it suffers from similar problems to CCS.
I'm choosing a proxy of >100 karma LW post. The post does not have to be solely about this claim, but it should be materially about it. e.g. a general criticism of a bunch of methods with a section on this would count. A popular post with an unrelated postscriptum claiming this wouldn't count.
This question is managed and resolved by Manifold.
Market context
Get
1,000 to start trading!
People are also trading
Related questions
Will "Detecting Strategic Deception Using Linear Probes" make the top fifty posts in LessWrong's 2025 Annual Review?
14% chance
Will "Reducing LLM deception at scale with self-oth..." make the top fifty posts in LessWrong's 2025 Annual Review?
14% chance
Will "LLM-generated text is not testimony" make the top fifty posts in LessWrong's 2025 Annual Review?
11% chance
Will "Surprising LLM reasoning failures make me thi..." make the top fifty posts in LessWrong's 2025 Annual Review?
16% chance
Will "Serious Flaws in CAST" make the top fifty posts in LessWrong's 2025 Annual Review?
14% chance
Will "Recent LLMs can use filler tokens or problem ..." make the top fifty posts in LessWrong's 2025 Annual Review?
13% chance
Will "How I stopped being sure LLMs are just making..." make the top fifty posts in LessWrong's 2025 Annual Review?
14% chance
Will "Problems I've Tried to Legibilize" make the top fifty posts in LessWrong's 2025 Annual Review?
9% chance