By the end of 2026, will we have transparency into any useful internal pattern within a Large Language Model whose semantics would have been unfamiliar to AI and cognitive science in 2006?
789
20kṀ400k
2027
28%
chance

In "Moving the Eiffel Tower to ROME", a paper claimed to have identified where the fact "The Eiffel Tower is in France" had been stored within GPT-J-6B, in the sense that you could poke the GPT there and make it believe the Eiffel Tower was in Rome.

"The Eiffel Tower is in France" seems (in my personal judgment) like the sort of fact that early AI pioneers could and did represent within GOFAI systems. GPT-J probably does more with that fact - it can for example answer how to get to the Eiffel Tower from Berlin, believing that the Eiffel Tower is in Rome. But the paper didn't offer neural transparency into how GPT-J gives directions, we don't know the stored patterns for answering that part - just a neural representation of the brute idea that GOFAI pioneers might've represented with in(Eiffel-Tower, Rome).

 

This market reflects the probability that, in the personal judgment of Eliezer Yudkowsky, anyone will have uncovered any sort of data, pattern, cognitive representation, within a text transformer / large language model (LLM), whose semantic pattern and nature wasn't familiar to AI and cognitive science in 2006 (to pick an arbitrary threshold for "before the rise of deep learning").

 

Also in 2006, somebody might've represented "the Eiffel Tower is in France" by assigning spatial coordinates to the Eiffel Tower and a regional boundary to France. Idioms like that appear in eg video games long predating 2006. Nobody has yet identified emergent environmental-spatial-coordinate representations inside a text transformer model, so far as I know; but even if someone did so before the end of 2026 - as mighty a triumph as that would be - it would not (in the personal judgment of Eliezer Yudkowsky) be an instance of somebody finding a cognitive pattern represented inside a text transformer, which pattern was unknown to cognitive science in 2006.

 

2006 similarly knew about linear regression, k-nearest-neighbor, principle components analysis, etcetera, even though these patterns were considered "statistical learning" rather than "Good-Old-Fashioned AI". Identifying an emergent kNN algorithm inside an LLM would again not constitute "understanding via transparency, within an LLM, some pattern and representation of cognition not known in 2006 or earlier". Likewise for TD-learning and other biologically inspired algorithms, including those considered the domain of neuroscience (from 2006 or earlier).

 

GOFAI and kNN and similar technologies did not suffice to, say, invent new funny jokes, or carry on a realistic conversation, or do any sort of intellectual labor. The intent of this proposition, if relevant, is to assert that by end of 2026 we will not be able to grasp any inkling of the cognition inside of LLMs by which they do much more than AIs could do in 2006; we will not have decoded any cognitive representations inside of LLMs supporting any cognitive capabilities original to the era of deep learning. We will only be able to hunt down internal cognition of the sort that lets LLMs do more trivial and old-AI-ish cognitive steps, like localizing the Eiffel Tower to France (or Rome); on their way to completing larger and more impressive tasks, incorporating other cognitive steps; whose representations inside the LLM, even if we have some idea of which weights are involved, have not yet been decoded in a way semantically meaningful to a human.

  • Update 2025-27-01 (PST): - Example of LLM behavior: "Being able to talk like a person." No previous AI algorithm could, and we don't know why LLM parameters can. (AI summary of creator comment)

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy