According to 20 AI safety experts, what is the most promising research direction in AI safety today?
15
135
668
Dec 31
21%
Alignment & scalable oversight
15%
Mechanistic interpretability
3%
Robustness
20%
Model evaluation / red-teaming
22%
Governance research
20%
Other

I have been conducting an informal survey of AI safety experts to elicit their opinions on various topics. I will end up with responses from around 20 people, including researchers at DeepMind, Anthropic, Redwood, FAR AI, and others. The sample was pseudo-randomly selected, optimising for a) diversity of opinion, b) diversity of background, c) seniority, and d) who I could easily track down.

One of my questions was: "What research direction do you think will reduce existential risk the most?" I asked participants to answer from their inside view as much as possible.

Which theme of answer came up most often?

I will resolve this question when the post for this survey is published, which will happen some time between March and June. Thanks to Rubi Hudson for suggesting turning this into a prediction market.

Get Ṁ200 play money
Sort by:

“Alignment & Scalable Oversight” combines two different ideas.

Scalable Oversight isn’t even an approach to alignment really. It’s just about detecting misalignment or bad behavior when you can’t manually review everything a model does.

More related questions