Will any LLM released by EOY 2025 be dangerously ASL-3 as defined by Anthropic?
34
343
1.6k
2026
44%
chance

Given the 2024 market saw interest but opened up very low, I figured I'd open up 2025 as well.

As per Anthropic's scaling policy here: https://www.anthropic.com/index/anthropics-responsible-scaling-policy

ASL-3 is defined as: "ASL-3 refers to systems that substantially increase the risk of catastrophic misuse compared to non-AI baselines (e.g. search engines or textbooks) OR that show low-level autonomous capabilities."

Anthropic commits to not deploying ASL-3 models 'if they show any meaningful catastrophic misuse risk under adversarial testing by world-class red-teamers (this is in contrast to merely a commitment to perform red-teaming).'

Resolves to YES if in my best judgment this happened. I will put large weight on Anthropic's statements on this question, and on general consensus including polls, but will go my own way if I feel sufficiently strongly about it.

Resolves to NO if in my best judgment this does not happen.

(Resolves to a percentage if there is genuine uncertainty but the bar for doing this is high and I find this highly unlikely.)

If a model is created but not released to at least a substantial outside beta testing group by the deadline, it does not count.

I interpret for now 'low-level autonomous capabilities' as something that would tempt reasonable people to give the model real-world actual-stakes autonomous tasks for mundane utility purposes, with the expectation this was economically wise, or the ability to otherwise make money on its own, or similar. If Anthropic clarifies I will use their definition.

No currently released system currently counts, including GPT-4, Claude-2 and Llama-2, barring very unexpected advancements in autonomous capability scaffolding on top of them, but in theory that could also do it.

I reserve the right to modify the resolution details for clarity and intent, it will retain the same criteria as the 2024 version (and if that resolves YES so does this).

EOY 2024 Version of this market is here: https://manifold.markets/ZviMowshowitz/will-any-llm-released-by-eoy-2024-b

Will any LLM released by EOY 2024 be dangerously ASL-3 as defined by Anthropic?
13% chance. As per Anthropic's scaling policy here: https://www.anthropic.com/index/anthropics-responsible-scaling-policy ASL-3 is defined as: "ASL-3 refers to systems that substantially increase the risk of catastrophic misuse compared to non-AI baselines (e.g. search engines or textbooks) OR that show low-level autonomous capabilities." Anthropic commits to not deploying ASL-3 models 'if they show any meaningful catastrophic misuse risk under adversarial testing by world-class red-teamers (this is in contrast to merely a commitment to perform red-teaming).' Resolves to YES if in my best judgment this happened. I will put large weight on Anthropic's statements on this question, and on general consensus including polls, but will go my own way if I feel sufficiently strongly about it. Resolves to NO if in my best judgment this does not happen. (Resolves to a percentage if there is genuine uncertainty but the bar for doing this is high and I find this highly unlikely.) If a model is created but not released to at least a substantial outside beta testing group by the deadline, it does not count. I interpret for now 'low-level autonomous capabilities' as something that would tempt reasonable people to give the model real-world actual-stakes autonomous tasks for mundane utility purposes, with the expectation this was economically wise, or the ability to otherwise make money on its own, or similar. If Anthropic clarifies I will use their definition. No currently released system currently counts, including GPT-4, Claude-2 and Llama-2, barring very unexpected advancements in autonomous capability scaffolding on top of them, but in theory that could also do it. I reserve the right to modify the resolution details for clarity and intent.
Get Ṁ600 play money
Sort by:
predicts YES

If a model is released at the end of the year, will you wait for Anthropic to judge it?

@Mira Yes, if the answer isn't obvious.