Neural Nets will have human-level situational awareness by the end of 2025.

Set criteria:

  • Understand that they're NNs, how their actions interface with the world.

  • Can explain the likely consequences of their actions

Inspired by tweet thread:


Get Ṁ600 play money
Sort by:

Any updated thoughts on how this will be operationalized? I'm not sure what tests we could apply here that they don't already obviously pass.

For this to resolve no, do we just have to find a few examples of prompts that consistently "trick" the AI in ways that humans wouldn't be tricked? If so, I actually feel this is very likely to resolve no.

But if they just have to understand that they're an LLM talking to a human through a chat interface it seems an obvious yes and we can resolve today.

For the record, my object-level prediction on this is ~39%, but I'd put ~58% chance that Richard will see it as yes. Accounting for that and Nathan's perception of "community consensus," I'm betting at ~54%.

The scary kind of situational awareness is when a model uses situational knowledge to guide its outputs in a "semantics-agnostic" way. I.e. there's a spectrum between 'coherently talk about self' to 'act on self-knowledge in contexts not mentioning anything about self'. I wrote up an example of the spookier kind of situational awareness [here](, but I suspect it's very hard to come up with general criterion describing more things of this kind in advanced.

@JacobPfau @NathanpmYoung C.f. also Evan's discussion in this section. Testing for situational awareness would involve training the model on mentions of information relevant to its situation, and then verifying that it uses this information in very different settings.

If it's going to judged based on how they answer questions about it, it's doesn't seem that unlikely, answering questions is their strength.

On what date will liberals have human level self awareness?

@NathanpmYoung (to readers, I'm a liberal)