Neural Nets will have human-level situational awareness by the end of 2025.
➕
Plus
30
Ṁ1555
2025
66%
chance

Set criteria:

  • Understand that they're NNs, how their actions interface with the world.

  • Can explain the likely consequences of their actions

Inspired by tweet thread:

Link: https://twitter.com/RichardMCNgo/status/1640568775018975232?s=20

Get
Ṁ1,000
and
S3.00
Sort by:

Any updated thoughts on how this will be operationalized? I'm not sure what tests we could apply here that they don't already obviously pass.

For this to resolve no, do we just have to find a few examples of prompts that consistently "trick" the AI in ways that humans wouldn't be tricked? If so, I actually feel this is very likely to resolve no.

But if they just have to understand that they're an LLM talking to a human through a chat interface it seems an obvious yes and we can resolve today.

@ChrisPrichard the resolution criteria is extremely ambiguous. I have no idea how this is going to resolve. Chatgpt can explain it's a neural network and the consequences of it's actions. Does it "understand" it? How will it be tested?

For the record, my object-level prediction on this is ~39%, but I'd put ~58% chance that Richard will see it as yes. Accounting for that and Nathan's perception of "community consensus," I'm betting at ~54%.

The scary kind of situational awareness is when a model uses situational knowledge to guide its outputs in a "semantics-agnostic" way. I.e. there's a spectrum between 'coherently talk about self' to 'act on self-knowledge in contexts not mentioning anything about self'. I wrote up an example of the spookier kind of situational awareness [here](https://www.lesswrong.com/posts/tJzdzGdTGrqFf9ekw/early-situational-awareness-and-its-implications-a-story), but I suspect it's very hard to come up with general criterion describing more things of this kind in advanced.

@JacobPfau @NathanpmYoung C.f. also Evan's discussion in this section. Testing for situational awareness would involve training the model on mentions of information relevant to its situation, and then verifying that it uses this information in very different settings.

If it's going to judged based on how they answer questions about it, it's doesn't seem that unlikely, answering questions is their strength.

On what date will liberals have human level self awareness?

@NathanpmYoung (to readers, I'm a liberal)

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules