The Natural Abstractions research program by John Wentworth is based on the idea that our world might be well-described by simple concepts which any method of cognition will end up converging upon. A hope is that by studying these natural abstractions and the cognitive algorithms that use them, we might produce improved interpretability tools.
In 4 years, I will evaluate Natural Abstractions and decide whether there have been any important good results since today. I will probably ask some of the alignment researchers I most respect (such as John Wentworth or Steven Byrnes) for advice about the assessment, unless it is dead-obvious.
About me: I have been following AI and alignment research on and off for years, and have a somewhat reasonable mathematical background to evaluate it. I tend to have an informal idea of the viability of various alignment proposals, though it's quite possible that idea might be wrong.
At the time of making the prediction market, my impression is that the Natural Abstractions research program has slowed down/gotten stuck, with no genuine news for maybe half a year. I was excited about the program when it started, though I came to believe that we would probably need structural changes to networks in order to be able to truly end up with extractable abstractions, which seems to sort of contradict the "naturality" requirement of natural abstractions.
More on Natural Abstractions: