
Works to be considered include Arxiv papers first appearing in this time window, Lesswrong posts, and paper-like posts (mainly to include Anthropic papers). This time window includes both 2019 and 2022. 'Significant' here means was contributed the most to progress towards AI alignment and AI safety. This is obviously very subjective.
If I were to answer this question for papers 2016-2019, possible answers would have included, among others, 'AI safety via debate', 'The off switch game'.
Update 2025-04-01 (PST) (AI summary of creator comment): - The resolution criteria may be adjusted to resolve to induction heads specifically or to include all of the above works.
Community input is being sought to finalize the resolution criteria.
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ18 | |
2 | Ṁ9 | |
3 | Ṁ8 | |
4 | Ṁ2 | |
5 | Ṁ2 |
People are also trading
45% on interp (of which 50% induction, 35% the polysemanticity work, 15% causal scrubbing)
35% split equally between ELK&MAD
20% Other: Tamera's post on CoT supervision https://www.lesswrong.com/posts/FRRb6Gqem8k69ocbi/externalized-reasoning-oversight-a-research-direction-for
@JacobPfau I am somewhat conflicted on whether it's best to resolve to induction heads or to resolve to all of the above. If anyone wants to chime in on what they were expecting I'll take that into account and resolve in 24 hours.
@JacobPfau For the purposes of this question I'll include the associated Arxiv paper under the "Mechanistic Anomaly Detection" option.