The Shard Theory research program by Team Shard is based on the idea that we should directly study the what types of cognition/circuitry are encouraged by reinforcement learning, and reason mechanistically about how reward leads to them, rather than the classic Outer Alignment/Inner Alignment factorization which aims to create a reward function that matches human values and somehow give an AI a goal of maximizing that reward. A hope is that understanding how reward maps to behavior is more tractable than Outer Alignment/Inner Alignment, and that we therefore might be able to solve alignment without solving Outer Alignment/Inner Alignment.
In 4 years, I will evaluate Shard Theory and decide whether there have been any important good results since today. I will probably ask some of the alignment researchers I most respect (such as John Wentworth or Steven Byrnes) for advice about the assessment, unless it is dead-obvious.
About me: I have been following AI and alignment research on and off for years, and have a somewhat reasonable mathematical background to evaluate it. I tend to have an informal idea of the viability of various alignment proposals, though it's quite possible that idea might be wrong.
At the time of making the prediction market, my impression is that the attempt to avoid needing to solve Outer Alignment will utterly fail, essentially because the self-supervised learning algorithms that can be used to indefinitely increase capabilities inherently require something like Outer Alignment to direct them towards the correct goal. However, Shard Theory might still produce insights useful for Inner Alignment, as it could give us a better understanding of how training affects neural networks.
More on Shard Theory:
https://www.lesswrong.com/posts/xqkGmfikqapbJ2YMj/shard-theory-an-overview