Resolves positively if I believe the plurality of the AI safety field should be focused on this problem. Examples of other focuses would be robotic safety, AI boxing, corrigibility, etc. If I believe some theoretical issue should be a priority/focus with automated AI research as the most likely first application, then this also resolves positively.
An incomprehensive list of things that I consider to be automated AI research: (1) hardware innovation like chip design (2) writing ML papers or abstracts (3) writing neural network code (4) automating prompt engineering (5) automated theorem proving applied to ML-motivated problems.
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ178 | |
2 | Ṁ27 | |
3 | Ṁ22 | |
4 | Ṁ14 | |
5 | Ṁ12 |
I do believe automated AI research alignment is very important, but I doubt that this area and directly-contributing other areas can absorb as many researchers as interp. I also count interp as alignment research, so this resolves no. E.g. I think it'd be net bad if enough alignment researchers pivoted out of their work into control to outnumber interp researchers--the opportunity cost would be too high.
Separately, there's some argument to be made that interp if successful would then be used for ensuring further safe alignment research. This is quite plausible to me, but I think the spirit of this question was narrower. The text says "first application" and I suspect there will be other successful use cases for interp along the way.
In the end, very hard question to resolve but "No" marginally better reflects my current view.
I divested and will not trade on this market.
OpenAI seems to see this as a priority https://openai.com/blog/introducing-superalignment
Buying shares because presumably this gets to the heart of the problem—no one in AI safety “does anything”; it has the intellectual rigor of Marxist discourse because there is zero output and zero testability.
If this market implies building real things YES is good—add modules that have real utility, the same way the human brain allows countless layers to override raw cognition.
This is the path—not childish philosophy
There are only three types of AI alignment:
censorship/control masquerading as “safety”, also known as “safety of snowflakes or OpenAI ethics”
theoretical non-sense, boondoggles, feel-good fundraising, etc. (lots of this, just like the self-driving startups that never drove cars, were just a centralized grift by wordcels with no connection to reality)
actual intelligent thought (usually centering on how AI will have impact: hint, involves technology, more like encryption/nuclear weapons than “inner/outer alignment” pseudo philosophy; thinks about geopolitics and power)
Presumably there is a small group doing adversarial robustness, explainability—actually this is just machine learning 😏