In 2025, what 2019-2022 work of AI safety will I think was most significant?
11
50
แน329แน880
2025
1D
1W
1M
ALL
18%
15%
Eliciting Latent Knowledge https://www.lesswrong.com/posts/qHCDysDnvhteW7kRd/arc-s-first-technical-report-eliciting-latent-knowledge
14%
Discovering Agents https://www.alignmentforum.org/posts/XxX2CAoFskuQNkBDy/discovering-agents
6%
Risks from Learned Optimization in Advanced Machine Learning Systems https://arxiv.org/abs/1906.01820
6%
Constitutional AI: Harmlessness from AI Feedback https://www.anthropic.com/constitutional.pdf
5%
Mechanistic Anomaly Detection https://www.alignmentforum.org/posts/vwt3wKXWaCvqZyF74/mechanistic-anomaly-detection-and-elk
5%
Infra-Bayesian Physicalism https://www.lesswrong.com/posts/gHgs2e2J5azvGFatb/infra-bayesian-physicalism-a-formal-theory-of-naturalized
3%
The Sharp Left Turn https://www.alignmentforum.org/s/v55BhXbpJuaExkpcD/p/GNhMPAWcfBCASy8e6
2%
2022 MIRI Alignment Discussion https://www.alignmentforum.org/s/v55BhXbpJuaExkpcD
1.8%
Other Not Listed Here
1.3%
Optimal Policies Tend to Seek Power https://arxiv.org/abs/1912.01683
Works to be considered include Arxiv papers first appearing in this time window, Lesswrong posts, and paper-like posts (mainly to include Anthropic papers). This time window includes both 2019 and 2022. 'Significant' here means was contributed the most to progress towards AI alignment and AI safety. This is obviously very subjective.
If I were to answer this question for papers 2016-2019, possible answers would have included, among others, 'AI safety via debate', 'The off switch game'.
Get แน600 play money
Related questions
Sort by:
@JacobPfau For the purposes of this question I'll include the associated Arxiv paper under the "Mechanistic Anomaly Detection" option.
Related questions
๐ค Which 5 AI advancements in 2024 will be the most important? [Free response]
By 2028, will I believe that contemporary AIs are aligned (posing no existential risk)?
35% chance
In 2025, will I believe that aligning automated AI research AI should be the focus of the alignment community?
48% chance
I make a contribution to AI safety that is endorsed by at least one high profile AI alignment researcher by the end of 2026
59% chance
Will there be a critical vulnerability discovered by AI by the end of 2025?
70% chance
In 2050, will the general consensus among experts be that the concern over AI risk in the 2020s was justified?
79% chance
By the end of 2025, which piece of advice will I feel has had the most positive impact on me becoming an effective AI alignment researcher?
Will I still consider improving AI X-Safety my top priority on EOY 2024?
61% chance
WIll I work (at some point) at a top AI lab on safety in the next 5 years?
73% chance
By 2030, which field will AI have the most revolutionary impact on?