
In 2025, what 2019-2022 work of AI safety will I think was most significant?
11
1kṀ328resolved Jan 5
35%18%Other
25%13%
18%15%
Eliciting Latent Knowledge https://www.lesswrong.com/posts/qHCDysDnvhteW7kRd/arc-s-first-technical-report-eliciting-latent-knowledge
17%5%
Mechanistic Anomaly Detection https://www.alignmentforum.org/posts/vwt3wKXWaCvqZyF74/mechanistic-anomaly-detection-and-elk
1.3%
Optimal Policies Tend to Seek Power https://arxiv.org/abs/1912.01683
6%
Risks from Learned Optimization in Advanced Machine Learning Systems https://arxiv.org/abs/1906.01820
3%
The Sharp Left Turn https://www.alignmentforum.org/s/v55BhXbpJuaExkpcD/p/GNhMPAWcfBCASy8e6
2%
2022 MIRI Alignment Discussion https://www.alignmentforum.org/s/v55BhXbpJuaExkpcD
6%
Constitutional AI: Harmlessness from AI Feedback https://www.anthropic.com/constitutional.pdf
0.8%
Softmax Linear Units https://transformer-circuits.pub/2022/solu/index.html
5%
Infra-Bayesian Physicalism https://www.lesswrong.com/posts/gHgs2e2J5azvGFatb/infra-bayesian-physicalism-a-formal-theory-of-naturalized
14%
Discovering Agents https://www.alignmentforum.org/posts/XxX2CAoFskuQNkBDy/discovering-agents
1.8%
Other Not Listed Here
Works to be considered include Arxiv papers first appearing in this time window, Lesswrong posts, and paper-like posts (mainly to include Anthropic papers). This time window includes both 2019 and 2022. 'Significant' here means was contributed the most to progress towards AI alignment and AI safety. This is obviously very subjective.
If I were to answer this question for papers 2016-2019, possible answers would have included, among others, 'AI safety via debate', 'The off switch game'.
Update 2025-04-01 (PST) (AI summary of creator comment): - The resolution criteria may be adjusted to resolve to induction heads specifically or to include all of the above works.
Community input is being sought to finalize the resolution criteria.
This question is managed and resolved by Manifold.
Get
1,000 to start trading!
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ18 | |
2 | Ṁ9 | |
3 | Ṁ8 | |
4 | Ṁ2 | |
5 | Ṁ2 |
People are also trading
Related questions
Will there be serious AI safety drama at Google or Deepmind before 2026?
55% chance
What AI safety incidents will occur in 2025?
Will Anthropic be the best on AI safety among major AI labs at the end of 2025?
87% chance
Will there be serious AI safety drama at Meta AI before 2026?
45% chance
Will Destiny discuss AI Safety before 2026?
53% chance
In 2025 Jan, the UK AI summit will have been effective at AI safety? [Resolves to manifold poll]
40% chance
What will be the top-3 AI tools in 2025?
Will someone commit terrorism against an AI lab by the end of 2025 for AI-safety related reasons?
14% chance
In January 2026, how publicly salient will AI deepfakes/media be, vs AI labor impact, vs AI catastrophic risks?
Will AI be considered safe in 2030? (resolves to poll)
72% chance