Background
This is based on this Nick Cammarata tweet thread.
https://x.com/nickcammarata/status/1905321653401518517?s=46
“I’m at like 80% that in two years we’ll be able to get 4o to mechanistically audit its own circuits and explain what the core ideas behind ghiblification are“
Resolution Criteria
Resoloves to YES if there exists publicly available evidence (e.g., peer-reviewed paper, detailed research report, or credible demonstration verified by domain experts) demonstrating that a large language model (LLM) has successfully:
Mechanistically audited its own circuits, meaning it has identified, described, and explained the functional roles of specific internal computations or neuron groups within its own neural architecture at a detailed, circuit-level granularity. And is able to explain concepts like “ghiblification”, meaning it has clearly articulated how it internally represents, processes, or produces specific outputs.