Will Digital Neuroscience succeed in this decade? Resolves yes if there is at least one AI model at the end of 2030 for which humans can read what it's thinking and understand how the model works, and is close to the performance of the state of the art AGI.
@NoaNabeshima that's hard to define, but the current understanding of models being "this giant inscrutable matrix was trained over this data set with this loss function" isn't very useful to us. The primary reason I'm interested in seeing what a model is thinking is to determine whether it is deceiving us.
I need to read up on interpretability research, but if there were a breakthrough in the field such that we could identify the purposes of different sections of a LLM, and be able to get a human interpretable output from most sections of the model at any point, that would be sufficient to resolve yes.
If model paradigms that provide better interpretability become the new state of the art in AI capabilities, I'm open to arguments about their capability for deception. For example, AutoGPT came out after I made this market and seems like it could lead to a somewhat more promising future for interpretability. For AutoGPT-style models, which are still inscrutable LLM at their core but using a human-interpretable scratchpad for their short term memory, I suppose it would depend on how reliant they are on that scratchpad, and whether they are capable of keeping illicit thoughts out of their scratchpad in order to deceive us.