How well are things going with AI safety?

MANIFOLD

resolved May 24

Meh

Kinda badly

Pretty badly

Very badly

Kinda well

Very well

See results

Well enough

Market context

Technology

AI Impacts

AI Safety

Get

1,000

to start trading!

People are also trading

AI safety activist causes AI catastrophe to slow development before 2040?

15% chance

Will AI be considered safe in 2030? (resolves to poll)

72% chance

Will something AI-related be an actual infohazard?

69% chance

Is AI Safety a grift?

22% chance

9 Comments

Sort by:

No fundamental progress and I still expect to be dead by 2030, so....

@Haiku what do you mean by fundamental progress?

@Bayesian The kind of progress that would make me expect to survive the advent of superintelligence. Breakthroughs in Agent Foundations preferably, or some sort of robustly generalizable emperical result in steerability or moral consistency or restraint or something.

By far the most significant progress has been in Interpretability, which is arguably not really AI Safety, and has serious limitations anyway.

@Haiku I'm curious: How much would such breakthroughs move your P(doom)?

My thinking: Let's say we have an ASI which is corrigible and does CEV. I'm not sure whether that would suffice to push my P(doom) below 1%. But it's pretty unlikely we'd be able to steer the world towards that outcome. A somewhat realistic "I never would have expected we get so lucky" scenario would be if we nailed the theory for those breakthroughs before someone/something creates an ASI. But those breakthroughs actually ending up in the first ASI? This'll probably be way more difficult that creating AGI without them. Even a best case scenario concerning steerability, agent foundations, corrigibility, CEV, etc would still require unprecedented global coordination in order for the theory to actually be implemented in time. I don't see any theoretical breakthroughs bringing my P(doom) down to, say, 10%.

@Primer My thinking exactly. In most cases, the coordination problem has to be solved in order for solving the technical alignment problem to be meaningful.

The "One Neat Trick that doctors don't want you to know about" is that if you solve the coordination problem, you don't have to solve the technical alignment problem. At least not right away, because then you can coordinate to just not build the damn thing.

I think whether or not we "solve alignment," we are going to need a global treaty. I have been spending a significant amount of my time, money, and effort toward that end, largely through PauseAI.

IMO it's going pretty well compared to if the AI progress went the optimizer route, but modern LLMs are barely agents so it's hard to make any real progress with those systems.

imagine how much worse it would be if we didn't have specs and just had user preference tuning

what do i vote if i think AI safety is going relatively poorly but i also think p(doom) is fairly low

@jim "pretty badly", it's all compared to the averaged expected timeline

People are also trading

AI safety activist causes AI catastrophe to slow development before 2040?

15% chance

Will AI be considered safe in 2030? (resolves to poll)

72% chance

Will something AI-related be an actual infohazard?

69% chance

Is AI Safety a grift?

22% chance

People are also trading

People are also trading

Related questions