For typical state-of-the-art AI systems in 2028, do you think it will be possible for users to know the true reasons for systems making a particular choice?
By “true reasons” we mean the AI correctly explains its internal decision-making process in a way humans can understand. By “true reasons” we do not mean the decision itself is correct.
I would be concerned if we were relying on the AI's explanation of its reasons without being able to verify that explanation. To me, knowing the true reasons implies having interpretability techniques that can reliably decode the meaning of a change in neural net weights. If we are relying on an AI to tell us the reasons for a particular output, then we have built an AI with good self-understanding, but we do not understand that AI very well. Which seems hard to do, and also bad if we manage to do it.