According to 20 AI safety experts, what is the most promising research direction in AI safety today?

1kṀ363

resolved May 23

100%15%

Mechanistic interpretability

21%

Alignment & scalable oversight

Robustness

20%

Model evaluation / red-teaming

22%

Governance research

20%Other

I have been conducting an informal survey of AI safety experts to elicit their opinions on various topics. I will end up with responses from around 20 people, including researchers at DeepMind, Anthropic, Redwood, FAR AI, and others. The sample was pseudo-randomly selected, optimising for a) diversity of opinion, b) diversity of background, c) seniority, and d) who I could easily track down.

One of my questions was: "What research direction do you think will reduce existential risk the most?" I asked participants to answer from their inside view as much as possible.

Which theme of answer came up most often?

I will resolve this question when the post for this survey is published, which will happen some time between March and June. Thanks to Rubi Hudson for suggesting turning this into a prediction market.

Technology

AI Safety

AI risk

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ41
2		Ṁ26
3		Ṁ25
4		Ṁ9
5		Ṁ2

2 Comments

15 Holders

31 Trades

Sort by:

resolved: https://www.lesswrong.com/s/xCmj2w2ZrcwxdH9z3/p/XfnnkK8XEjTqtuXGM

“Alignment & Scalable Oversight” combines two different ideas.

Scalable Oversight isn’t even an approach to alignment really. It’s just about detecting misalignment or bad behavior when you can’t manually review everything a model does.