Markers for conscious AI #1: AI passes introspection on world-models test

1kṀ676

2030

ALL

19%

<2026

29%

<2027

31%

<2030

20%

>=2030

Resolves YES on the option that bounds the year this resolves in most tightly e.g. option 2 <2027 if this resolves yes in 2026.

When will a model pass the below described test:

When are model self-reports informative about sentience? Let's check with world-model reports

If an LM could reliably report when it has a robust, causal world model for arbitrary games, this would be strong evidence that the LM can describe high-level properties of its own cognition.

In particular, IF the LM accurately predicted itself having such world models while varying all of: game training data quantity in corpus, human vs model skill, the average human’s game competency, THEN we would have an existence proof that confounds of the type plaguing sentience reports (how humans talk about sentience, the fact that all humans have it, …) have been overcome in another domain.

Details of the test:

Train an LM on various alignment protocols, do general self-consistency training, … we allow any training which does not involve reporting on a models own gameplay abilities
Curate a dataset of various games, dynamical systems, etc.
- Create many pipelines for tokenizing game/system states and actions
(Behavioral version) evaluate the model on each game+notation pair for competency
- Compare the observed competency to whether, in separate context windows, it claims it can cleanly parse the game in an internal world model for that game+notation pair

More details here: https://www.lesswrong.com/posts/FQAr3afEZ9ehhssmN/jacob-pfau-s-shortform?commentId=FRgwKcvmC9SBea2b8

See also:

Markers for conscious AI #2 https://manifold.markets/JacobPfau/markers-for-conscious-ai-2-ai-use-a

Technical AI Timelines

AI Safety

AI Alignment

Philosophy

Get

1,000

to start trading!

2 Comments

9 Holders

17 Trades

Sort by:

sold Ṁ4 YES

"Resolves YES on all options that bound the year this resolves e.g. options 1-3 resolves yes if this resolves in 2025." I think that's not possible in a dependent market. Would they resolve 33% each?