Will Anthropic's RSP security commitments (as of Oct. 28 2023) cause them to pause scaling for at least one month?

1kṀ3407

Jan 1

11%

chance

ALL

See this LessWrong discussion for context: https://www.lesswrong.com/posts/Np5Q3Mhz2AiPtejGN/we-re-not-ready-thoughts-on-pausing-and-responsible-scaling-4?commentId=dz8WsKHGFKN7YxsGX

Anthropic has a number of security commitments in their RSP that they say they need to meet before they develop models they define as "ASL-3 models".

This question will resolve based on whatever the consensus answer between me and Evan (Hubinger) is whether Anthropic did indeed delay scaling or developing more powerful systems for at least one month as a result of those security commitments. This question will resolve whenever Anthropic reaches ASL-4, or when neither me nor Evan think the answer to this is still really going to change (e.g. if Anthropic closes down, or RSPs aren't really a thing anymore, etc.).

If me and Evan disagree I'll ask the following people (in order) to resolve this question one way or another, going down the list whenever someone doesn't respond to me in a day or so, or doesn't want to look into this: Buck Shlegeris, Nate Soares, Holden Karnofsky, Paul Christiano, Mark Xu

AI Safety

Get

1,000

to start trading!

People are also trading

[Metaculus] Will OpenAI, DeepMind, or Anthropic announce a pause on large training runs for safety reasons, before 2026?

3% chance

Will Anthropic launch something cryptocurrency related in 2025?

4% chance

Will Anthropic, before 2035, pause development for at least six months as a result of safety evaluations?

41% chance

Will Anthropic be acquired by another company before the end of 2025?

1% chance

Will Anthropic, before 2035, pause development for at least a year as a result of safety evaluations?

27% chance

Anthropic's API exceeds 99.6% uptime at end of 2025?

87% chance

Will Anthropic have AI-related IP stolen before 2026?

7% chance

Will Anthropic, before 2035, pause development as a result of safety evaluations?

58% chance

Will Anthropic report a net profit for a next phase of Project Vend or its successor before 2026?

25% chance

Anthropic 'falls behind' by July 2026?

Sort by:

One concern I have with this operationalization is I think it's fairly likely that they observe potentially dangerous capabilities, pause for a month or two, make some token effort to improve model safety that doesn't actually meaningfully reduce x-risk, and then resume scaling.

If the operationalization captured the spirit of "Anthropic pauses for long enough to actually solve the problems to a degree of confidence that's warranted given the gravity of the situation", I'd put the probability at ~5%.

Anthropic has a number of security commitments in their RSP that they say they need to meet before they develop models they define as "ASL-3 models".

By my reading, they are not committing to doing anything before developing ASL-3 models, only after. (Or rather, they've only committed themselves to evaluating for ASL-3 capabilities, and their current RSP says they don't yet have a fully operationalized definition of ASL-3 so that can mean basically whatever they want it to mean.)

So to be clear, is this (A) a market on whether the current commitments will cause a pause or is it (B) a market on whether Anthropic's future RSPs will cause them to pause, including via new commitments they have not yet made?

(From the comments discussion that led to this, it seemed like a disagreement over (A), and one might be interested in both)

If they change security commitments and drop the current ones, how does this resolve?

predictedNO

If the new security commitments seem approximately strictly stronger than present ones, then I would resolve it as "Yes" (i.e. if they kind subsume the present ones). If they change in a bunch of different ways that aren't strictly stronger, I would resolve it as "No" (since then the current commitments did indeed not really matter).