In 2025, what 2019-2022 work of AI safety will I think was most significant?
11
1kṀ328
resolved Jan 5
35%18%Other
1.3%
Optimal Policies Tend to Seek Power https://arxiv.org/abs/1912.01683
6%
Risks from Learned Optimization in Advanced Machine Learning Systems https://arxiv.org/abs/1906.01820
6%
Constitutional AI: Harmlessness from AI Feedback https://www.anthropic.com/constitutional.pdf
1.8%
Other Not Listed Here

Works to be considered include Arxiv papers first appearing in this time window, Lesswrong posts, and paper-like posts (mainly to include Anthropic papers). This time window includes both 2019 and 2022. 'Significant' here means was contributed the most to progress towards AI alignment and AI safety. This is obviously very subjective.

If I were to answer this question for papers 2016-2019, possible answers would have included, among others, 'AI safety via debate', 'The off switch game'.

  • Update 2025-04-01 (PST) (AI summary of creator comment): - The resolution criteria may be adjusted to resolve to induction heads specifically or to include all of the above works.

    • Community input is being sought to finalize the resolution criteria.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ18
2Ṁ9
3Ṁ8
4Ṁ2
5Ṁ2
Sort by:

45% on interp (of which 50% induction, 35% the polysemanticity work, 15% causal scrubbing)
35% split equally between ELK&MAD
20% Other: Tamera's post on CoT supervision https://www.lesswrong.com/posts/FRRb6Gqem8k69ocbi/externalized-reasoning-oversight-a-research-direction-for

@JacobPfau I am somewhat conflicted on whether it's best to resolve to induction heads or to resolve to all of the above. If anyone wants to chime in on what they were expecting I'll take that into account and resolve in 24 hours.

Relevant:

None of those.

@Lauro Is this intended to include the heuristic arguments work?

@JacobPfau For the purposes of this question I'll include the associated Arxiv paper under the "Mechanistic Anomaly Detection" option.

@JacobPfau ah yeah I agree it makes sense to include heuristic arguments here!

© Manifold Markets, Inc.TermsPrivacy