Mechanistic interpretability aims to reverse engineer neural networks in a way that is analogous to reverse engineering a compiled binary computer program. Achieving this level of interpretability for a neural network like the actual human brain would involve creating a binary computer program that is interpretable by expert human programmers and can emulate the input-output behavior of an actual human brain with high accuracy.
Before January 1st, 2040, will mechanistic interpretability be essentially solved for the human brain, resulting in a binary computer program that is interpretable by ordinary expert human programmers and emulates the brain's input-output behavior up to a high level of accuracy?
Resolution Criteria:
This question will resolve positively if, before January 1st, 2040, a binary computer program is developed that meets the following criteria:
Interpretability: The binary computer program must be interpretable by ordinary expert human programmers, which means:
a. The program can be read, understood, and modified by programmers who are proficient in the programming language it is written in, and have expertise in the fields of computer science and machine learning.
b. The program is well-documented, with clear explanations of its components, algorithms, and functions.
c. The program's structure and organization adhere to established software engineering principles, enabling efficient navigation and comprehension by expert programmers.Accuracy: The binary computer program must emulate the human brain's input-output behavior with high accuracy, as demonstrated by being essentially cognitively indistinguishable from an ordinary adult human in speech and behavior over periods longer than a week. This accuracy must be demonstrated by a set of comprehensive tests.
Not fake: I will use my personal judgement to determine whether a candidate solution seems fake or not. A fake solution is anything that satisfies these criteria without getting at the spirit of the question. I'm trying to understand whether we will reverse engineer the brain in the complete sense, not just whether someone will create a program that technically passes these criteria.
This question will resolve negatively if, before January 1st, 2040, no binary computer program meeting the interpretability and accuracy criteria is developed and verified according to the above requirements. If there is ambiguity or debate about whether a particular program meets the resolution criteria, I will use my discretion to determine the appropriate resolution.
For AI the training task and data is fully known and inspectable.
The training data for humans is lost. The training algorithm for the brain is evolution over thousands of years. An example of the complication: a brain feature might exist now due to refitting an older feature used when our ancestors were in very different and now unknown circumstances. We can’t look back into the past to see the events that caused one animal to die and another live. So our ability to interpret is severely handicapped in comparison to AI interpretability.