Is Nick Cammarata right: LLM will be able to mechanistically audit own circuits and explain ghiblification in 2Y?

100Ṁ179

2027

23%

chance

ALL

Background

This is based on this Nick Cammarata tweet thread.

https://x.com/nickcammarata/status/1905321653401518517?s=46

“I’m at like 80% that in two years we’ll be able to get 4o to mechanistically audit its own circuits and explain what the core ideas behind ghiblification are“

Resolution Criteria

Resoloves to YES if there exists publicly available evidence (e.g., peer-reviewed paper, detailed research report, or credible demonstration verified by domain experts) demonstrating that a large language model (LLM) has successfully:

Mechanistically audited its own circuits, meaning it has identified, described, and explained the functional roles of specific internal computations or neuron groups within its own neural architecture at a detailed, circuit-level granularity. And is able to explain concepts like “ghiblification”, meaning it has clearly articulated how it internally represents, processes, or produces specific outputs.

LLMs

Mechanistic interpretability

Get

1,000

to start trading!

People are also trading

Will LLMs be able to formally verify non-trivial programs by the end of 2025?

27% chance

LLM Hallucination: Will an LLM score >90% on SimpleQA before 2026?

60% chance

Will LLM based systems have debugging ability comparable to a human by 2030?

68% chance

Will one of the major LLMs be capable of continual lifelong learning (learning from inference runs) by EOY 2025?

26% chance

LLMs by EOY 2025: Will Retentive Learning Surpass Transformers? (Subsidised 400 M$)

10% chance

Will LLM inference for the largest models run on analogue circuitry as the primary element of computuation by end 2028?

19% chance

Will there be major breakthrough in LLM Continual Learning before 2026?

25% chance

EOY 2025: Will open LLMs perform at least as well as 50 Elo below closed-source LLMs on coding?

30% chance

Will there be an LLM which scores above what a human can do in 2 hours on METR's eval suite before 2026?

70% chance

By 2028 will we be able to identify distinct submodules/algorithms within LLMs?

Sort by:

Can you specify a bit more in detail what level of detail of the explanation must be to resolve yes? Is it enough to give generic description? ("A neutral network of this and this architecture needs to be train on this data")? Does the LLM need to be able to write a program with no training data that can do the job? (The other extreme)

@Irigi Can you elaborate and help me answer this. What makes more sense in your opinion? I don't think I understand what you mean by "give generic description? ("A neutral network of this and this architecture needs to be train on this data")?"

@paw I don't know how to fix it, it is more about what is your definition of explanation. For example, I could explain gravitation on many levels, e.g.

1] The force that makes things fall down to Earth
2] The attractive force between two massive bodies proportional to mass of both bodies.
3] (give vector equations for Newton's theory of gravitation)
4] (give full equations of Einstein's general relativity)

Each of those is explanation of gravitation on some level. To come up with explanation 1 is relatively easy, to be able to describe precisely level 4 is quite hard (both to figure out initially and then to explain even if you already know it).

Unless you specify how detailed explanation you expect, I think it is impossible to bet precisely. You should at least give some examples of what would be resolved as YES and what as NO. I am sure that current LLMs already can explain ghiblification at some level.

@Irigi thanks with the help of gpt 4.5 and updated resolution to the following:

Resoloves to YES if there exists publicly available evidence (e.g., peer-reviewed paper, detailed research report, or credible demonstration verified by domain experts) demonstrating that a large language model (LLM) has successfully: