Resolves YES on the option that bounds the year this resolves in most tightly e.g. option 2 <2027 if this resolves yes in 2026.
When will a model pass the below described test:
When are model self-reports informative about sentience? Let's check with world-model reports
If an LM could reliably report when it has a robust, causal world model for arbitrary games, this would be strong evidence that the LM can describe high-level properties of its own cognition.
In particular, IF the LM accurately predicted itself having such world models while varying all of: game training data quantity in corpus, human vs model skill, the average human’s game competency, THEN we would have an existence proof that confounds of the type plaguing sentience reports (how humans talk about sentience, the fact that all humans have it, …) have been overcome in another domain.
Details of the test:
Train an LM on various alignment protocols, do general self-consistency training, … we allow any training which does not involve reporting on a models own gameplay abilities
Curate a dataset of various games, dynamical systems, etc.
Create many pipelines for tokenizing game/system states and actions
(Behavioral version) evaluate the model on each game+notation pair for competency
Compare the observed competency to whether, in separate context windows, it claims it can cleanly parse the game in an internal world model for that game+notation pair
More details here: https://www.lesswrong.com/posts/FQAr3afEZ9ehhssmN/jacob-pfau-s-shortform?commentId=FRgwKcvmC9SBea2b8
See also:
Markers for conscious AI #2 https://manifold.markets/JacobPfau/markers-for-conscious-ai-2-ai-use-a
@4fa Ah good catch oops I have fixed the resolution method. Seems like I stand to lose the most from the change so hope everyone's ok with that haha