Will Anthropic's RSP security commitments (as of Oct. 28 2023) cause them to pause scaling for at least one month?

1kṀ3407

2026

11%

chance

ALL

See this LessWrong discussion for context: https://www.lesswrong.com/posts/Np5Q3Mhz2AiPtejGN/we-re-not-ready-thoughts-on-pausing-and-responsible-scaling-4?commentId=dz8WsKHGFKN7YxsGX

Anthropic has a number of security commitments in their RSP that they say they need to meet before they develop models they define as "ASL-3 models".

This question will resolve based on whatever the consensus answer between me and Evan (Hubinger) is whether Anthropic did indeed delay scaling or developing more powerful systems for at least one month as a result of those security commitments. This question will resolve whenever Anthropic reaches ASL-4, or when neither me nor Evan think the answer to this is still really going to change (e.g. if Anthropic closes down, or RSPs aren't really a thing anymore, etc.).

If me and Evan disagree I'll ask the following people (in order) to resolve this question one way or another, going down the list whenever someone doesn't respond to me in a day or so, or doesn't want to look into this: Buck Shlegeris, Nate Soares, Holden Karnofsky, Paul Christiano, Mark Xu

AI Safety

Get

1,000

to start trading!

5 Comments

28 Holders

56 Trades

Sort by:

One concern I have with this operationalization is I think it's fairly likely that they observe potentially dangerous capabilities, pause for a month or two, make some token effort to improve model safety that doesn't actually meaningfully reduce x-risk, and then resume scaling.

If the operationalization captured the spirit of "Anthropic pauses for long enough to actually solve the problems to a degree of confidence that's warranted given the gravity of the situation", I'd put the probability at ~5%.

Anthropic has a number of security commitments in their RSP that they say they need to meet before they develop models they define as "ASL-3 models".

By my reading, they are not committing to doing anything before developing ASL-3 models, only after. (Or rather, they've only committed themselves to evaluating for ASL-3 capabilities, and their current RSP says they don't yet have a fully operationalized definition of ASL-3 so that can mean basically whatever they want it to mean.)

So to be clear, is this (A) a market on whether the current commitments will cause a pause or is it (B) a market on whether Anthropic's future RSPs will cause them to pause, including via new commitments they have not yet made?

(From the comments discussion that led to this, it seemed like a disagreement over (A), and one might be interested in both)

If they change security commitments and drop the current ones, how does this resolve?

predictedNO

If the new security commitments seem approximately strictly stronger than present ones, then I would resolve it as "Yes" (i.e. if they kind subsume the present ones). If they change in a bunch of different ways that aren't strictly stronger, I would resolve it as "No" (since then the current commitments did indeed not really matter).

Related questions