Will we be able to read AIs minds in 2030?
31
113
630
2030
25%
chance

Will Digital Neuroscience succeed in this decade? Resolves yes if there is at least one AI model at the end of 2030 for which humans can read what it's thinking and understand how the model works, and is close to the performance of the state of the art AGI.

Get Ṁ200 play money
Sort by:

How much do we need to understand? Would or current levels of understanding of human minds work, or does it need to be more?

@MartinRandall no that would not be sufficient, we cannot currently "read people's minds"

How well do we need to understand how the model works?

@NoaNabeshima that's hard to define, but the current understanding of models being "this giant inscrutable matrix was trained over this data set with this loss function" isn't very useful to us. The primary reason I'm interested in seeing what a model is thinking is to determine whether it is deceiving us.

I need to read up on interpretability research, but if there were a breakthrough in the field such that we could identify the purposes of different sections of a LLM, and be able to get a human interpretable output from most sections of the model at any point, that would be sufficient to resolve yes.

If model paradigms that provide better interpretability become the new state of the art in AI capabilities, I'm open to arguments about their capability for deception. For example, AutoGPT came out after I made this market and seems like it could lead to a somewhat more promising future for interpretability. For AutoGPT-style models, which are still inscrutable LLM at their core but using a human-interpretable scratchpad for their short term memory, I suppose it would depend on how reliant they are on that scratchpad, and whether they are capable of keeping illicit thoughts out of their scratchpad in order to deceive us.

I'm not in the field but I'm not aware of any models that would meet this criteria today.