Works to be considered include Arxiv papers first appearing in this time window, Lesswrong posts, and paper-like posts (mainly to include Anthropic papers). This time window includes both 2019 and 2022. 'Significant' here means was contributed the most to progress towards AI alignment and AI safety. This is obviously very subjective.

If I were to answer this question for papers 2016-2019, possible answers would have included, among others, 'AI safety via debate', 'The off switch game'.

Update 2025-04-01 (PST) (AI summary of creator comment): - The resolution criteria may be adjusted to resolve to induction heads specifically or to include all of the above works.
- Community input is being sought to finalize the resolution criteria.

AI Safety

Effective Altruism

Change My Mind

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ18
2		Ṁ9
3		Ṁ8
4		Ṁ2
5		Ṁ2

People are also trading

What AI safety incidents will occur in 2025?

Will Anthropic be the best on AI safety among major AI labs at the end of 2025?

93% chance

Will there be serious AI safety drama at Meta AI before 2026?

9% chance

Will Destiny discuss AI Safety before 2026?

19% chance

Will there be serious AI safety drama at Google or Deepmind before 2026?

19% chance

Will AI Safety for Fleshy Humans part 3 get released before 2026?

64% chance

What will be the top-3 AI tools in 2025?

Will someone commit terrorism against an AI lab by the end of 2025 for AI-safety related reasons?

8% chance

In January 2026, how publicly salient will AI deepfakes/media be, vs AI labor impact, vs AI catastrophic risks?

Will AI be considered safe in 2030? (resolves to poll)

Sort by:

45% on interp (of which 50% induction, 35% the polysemanticity work, 15% causal scrubbing)
35% split equally between ELK&MAD
20% Other: Tamera's post on CoT supervision https://www.lesswrong.com/posts/FRRb6Gqem8k69ocbi/externalized-reasoning-oversight-a-research-direction-for

@JacobPfau I am somewhat conflicted on whether it's best to resolve to induction heads or to resolve to all of the above. If anyone wants to chime in on what they were expecting I'll take that into account and resolve in 24 hours.