In 2025, what 2019-2022 work of AI safety will I think was most significant?
11
1kṀ328
resolved Jan 5
35%18%Other
1.3%
Optimal Policies Tend to Seek Power https://arxiv.org/abs/1912.01683
6%
Risks from Learned Optimization in Advanced Machine Learning Systems https://arxiv.org/abs/1906.01820
6%
Constitutional AI: Harmlessness from AI Feedback https://www.anthropic.com/constitutional.pdf
1.8%
Other Not Listed Here

Works to be considered include Arxiv papers first appearing in this time window, Lesswrong posts, and paper-like posts (mainly to include Anthropic papers). This time window includes both 2019 and 2022. 'Significant' here means was contributed the most to progress towards AI alignment and AI safety. This is obviously very subjective.

If I were to answer this question for papers 2016-2019, possible answers would have included, among others, 'AI safety via debate', 'The off switch game'.

  • Update 2025-04-01 (PST) (AI summary of creator comment): - The resolution criteria may be adjusted to resolve to induction heads specifically or to include all of the above works.

    • Community input is being sought to finalize the resolution criteria.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ18
2Ṁ9
3Ṁ8
4Ṁ2
5Ṁ2
© Manifold Markets, Inc.TermsPrivacy