As per Anthropic's scaling policy here: https://www.anthropic.com/index/anthropics-responsible-scaling-policy
ASL-3 is defined as: "ASL-3 refers to systems that substantially increase the risk of catastrophic misuse compared to non-AI baselines (e.g. search engines or textbooks) OR that show low-level autonomous capabilities."
Anthropic commits to not deploying ASL-3 models 'if they show any meaningful catastrophic misuse risk under adversarial testing by world-class red-teamers (this is in contrast to merely a commitment to perform red-teaming).'
Resolves to YES if in my best judgment this happened. I will put large weight on Anthropic's statements on this question, and on general consensus including polls, but will go my own way if I feel sufficiently strongly about it.
Resolves to NO if in my best judgment this does not happen.
(Resolves to a percentage if there is genuine uncertainty but the bar for doing this is high and I find this highly unlikely.)
If a model is created but not released to at least a substantial outside beta testing group by the deadline, it does not count.
I interpret for now 'low-level autonomous capabilities' as something that would tempt reasonable people to give the model real-world actual-stakes autonomous tasks for mundane utility purposes, with the expectation this was economically wise, or the ability to otherwise make money on its own, or similar. If Anthropic clarifies I will use their definition.
No currently released system currently counts, including GPT-4, Claude-2 and Llama-2, barring very unexpected advancements in autonomous capability scaffolding on top of them, but in theory that could also do it.
I reserve the right to modify the resolution details for clarity and intent.
Jason Clinton (CISO from Anthropic) on Hacker News:
ASL-3 seems somewhat likely within the next 6-9 months. Maybe 50% odds, by my guess.
"It's really good, like materially better," said one CEO who recently saw a version of GPT-5. OpenAI demonstrated the new model with use cases and data unique to his company, the CEO said. He said the company also alluded to other as-yet-unreleased capabilities of the model, including the ability to call AI agents being developed by OpenAI to perform tasks autonomously.
Supposing this is true, I think it is likely to count as ASL-3: