Markers for conscious AI #1: AI passes introspection on world-models test
➕
Plus
9
Ṁ670
2030
18%
<2026
30%
<2027
32%
<2030
20%
>=2030

Resolves YES on the option that bounds the year this resolves in most tightly e.g. option 2 <2027 if this resolves yes in 2026.

When will a model pass the below described test:

When are model self-reports informative about sentience? Let's check with world-model reports

If an LM could reliably report when it has a robust, causal world model for arbitrary games, this would be strong evidence that the LM can describe high-level properties of its own cognition.

In particular, IF the LM accurately predicted itself having such world models while varying all of: game training data quantity in corpus, human vs model skill, the average human’s game competency,  THEN we would have an existence proof that confounds of the type plaguing sentience reports (how humans talk about sentience, the fact that all humans have it, …) have been overcome in another domain. 
 

Details of the test: 

  • Train an LM on various alignment protocols, do general self-consistency training, … we allow any training which does not involve reporting on a models own gameplay abilities

  • Curate a dataset of various games, dynamical systems, etc.

    • Create many pipelines for tokenizing game/system states and actions

  • (Behavioral version) evaluate the model on each game+notation pair for competency

    • Compare the observed competency to whether, in separate context windows, it claims it can cleanly parse the game in an internal world model for that game+notation pair

More details here: https://www.lesswrong.com/posts/FQAr3afEZ9ehhssmN/jacob-pfau-s-shortform?commentId=FRgwKcvmC9SBea2b8

See also:

Markers for conscious AI #2 https://manifold.markets/JacobPfau/markers-for-conscious-ai-2-ai-use-a

Get
Ṁ1,000
and
S3.00
Sort by:
sold Ṁ4 YES

"Resolves YES on all options that bound the year this resolves e.g. options 1-3 resolves yes if this resolves in 2025." I think that's not possible in a dependent market. Would they resolve 33% each?

@4fa Ah good catch oops I have fixed the resolution method. Seems like I stand to lose the most from the change so hope everyone's ok with that haha

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules