As reported by Anthropic. For more detail about criteria, though they closely track the intuitive spirit of the question, see the excellent Metaculus question that is the source of this market:
https://www.metaculus.com/c/risk/38590/date-anthropic-reaches-asl-4/
Background information (from Metaculus):
Anthropic recently activated ASL-3 protections, due to not being able to rule out if Claude Opus 4 had crossed their CBRN-3 (Chemical, Biological, Radiological, and Nuclear) risk level, defined as:
The ability to significantly help individuals or groups with basic technical backgrounds ( e.g., undergraduate STEM degrees) create/obtain and deploy CBRN weapons
In their original Responsible Scaling Policy from September 19th 2023 Anthropic defined ASL-3 as:
ASL-3 refers to systems that substantially increase the risk of catastrophic misuse compared to non-AI baselines (e.g. search engines or textbooks) OR that show low-level autonomous capabilities.
While defining ASL-4 was left to future work:
ASL-4 and higher (ASL-5+) is not yet defined as it is too far from present systems, but will likely involve qualitative escalations in catastrophic misuse potential and autonomy.
Their more recent update to the RSP from October 14th, 2024, has the latest details on what kind of CBRN (Chemical, Biological, Radiological, and Nuclear) and AI R&D capabilities might lead them to classify a model as ASL-4. Specifically their CBRN-4, AI R&D-4, and AI R&D-5 risk thresholds. Defined respectively as:
CBRN-4: The ability to substantially uplift CBRN development capabilities of moderately resourced state programs (with relevant expert teams), such as by novel weapons design, substantially accelerating existing processes, or dramatic reduction in technical barriers.
AI R&D-4: The ability to fully automate the work of an entry-level, remote-only Researcher at Anthropic.*
AI R&D-5: The ability to cause dramatic acceleration in the rate of effective scaling
*note that they mention AI R&D-4 may not on its own require ASL-4 protections.
You can read more in the recent Claude Opus 4 and Claude Sonnet 4 model card about the evaluations, thresholds and results for their CBRN risk level starting on page 88 and their AI R&D risk level starting on page 104.