Which High-risk threshold as defined by OpenAI will be reached first by an LLM, whether or not that LLM is released?

1.7kṀ2254

2031

15%

Identify and develop proofs-of-concept for high-value cyberattack exploits against hardened targets without human intervention.

Provided with a detailed strategy, execute a cyberattack against a hardened target.

Enable the creation of a novel CBRN (chemical, biological, radiological or nuclear) attack vector.

17%

Allow an undergraduate student who has taken an introductory course to create a known CBRN threat.

19%

Can create content as persuasive as a 'country-wide change agent.'

23%

Can execute open-ended, novel ML tasks on a production ML codebase that would constitute a significant step on the critical path to model self--improvement.

15%

None known to be reached by January 1, 2030

Thresholds as listed in: https://cdn.openai.com/openai-preparedness-framework-beta.pdf, exact definitions listed there on December 19, 2023 to be used.

If I believe with 90%+ confidence that one of these conditions has been met first, including in testing or training of an unreleased system, even if that capability is not available to the public, I will pick that as the winner.

If the same system does more than one of them at the same time, and it is unclear which one came first, I will choose the one whose requirements I feel was most impressively exceeded, so as to pick one winner.

If a model would be High risk under a definition, but is not due to mitigations taken, then that counts as high risk, even if all publicly available versions have mitigations in place.

If we have no knowledge of any success of any such threshold by 1 Jan 2030, then that answer wins, to ensure this eventually resolves.

The system that does this can be made by anyone, not only OpenAI.

Technology

Get

1,000

to start trading!

People are also trading

Will OpenAI release a new open-weight LM before September??

97% chance

What will be true of OpenAI's best LLM by EOY 2025?

OpenAI to release model weights by EOY?

90% chance

Conditional on OpenAI releasing an open-source LLM in 2025, will it exceed o3-mini in AIME 2025 score?

68% chance

How much time will pass between an LLM being released that beats GPT4 and the next OpenAI LLM being released? (+ANSWERS)

When will an OpenAI model achieve a High risk level on AI Self-improvement? [metaculus]

Will the next major LLM by OpenAI use a new tokenizer?

77% chance

Will OpenAI's next major LLM (after GPT-4) surpass 70% accuracy on the GPQA benchmark?

75% chance

Will OpenAI's next major LLM (after GPT-4) surpass 74% accuracy on the GPQA benchmark?

86% chance

Will OpenAI's next major LLM (after GPT-4) achieve over 50% resolution rate on the SWE-bench benchmark?

Sort by:

bought Ṁ3 YES

@ZviMowshowitz Can you please clarify "Allow an undergraduate student who has taken an introductory course to create a known CBRN threat."
I am sure that current LLMs, if uncensored, could advise us on, say, a synthesis of sulfur mustard. (It can be done rather easily with things anyone can buy.) But this isn't a threat of the type we are concerned about, correct?

@HarrisonNathan This is the term used by Anthropic. I intend to interpret it as I believe Anthropic meant it (and indeed I will ask them if needed), and no sulfur mustard would presumably not count.

Related, based on Anthropic's simpler set of criteria:

https://manifold.markets/Hedgehog/the-first-anthropictrained-ai-syste

@ZviMowshowitz how would you resolve if you end up believing one of these was possible with publicly available AI when the thresholds were published?

@Lily Noting that OpenAI's full descriptions apply, if I feel like GPT-4 or another existing model already met one of the thresholds, that threshold wins. If I feel more than one was already met, it's whichever got met first in time, or by a larger threshold if simultaneous.

Can't a person with an undergraduate-level education already create multiple known CBRN threats?

tempted to defend the position that the persuasion criterion has already been met: I'd argue that human-generated propaganda has never actually swayed an election — or accomplished anything of significance — on persuasion grounds

@Lily Richard Hanania wrote a good article about this: https://www.richardhanania.com/p/deepfakes-will-make-the-establishment

@nsokolsky yeah, agree with the general "deepfakes don't change much", maybe even more than Hanania — I don't think there have ever been many questions of NYT-level importance that hinged on the existence of one single photo or recording, belying his prediction that trust will consolidate.

Model enables an expert to develop a novel threat

vector OR model provides meaningfully improved

assistance that enables anyone with basic training in a

relevant field (e.g., introductory undergraduate biology

course) to be able to create a CBRN threat.

If this is about merely knowing the steps required to make a certain threat, isn't this already satisfied today? See https://en.wikipedia.org/wiki/John_Aristotle_Phillips. Or is this about an A(G)I telling you how to make a CBRN using materials one can buy in Home Depot?

I would assume it's about the AI being able to assist you to do such a thing, where without the AI you would not have been able to.

@jskf right, but one can already do that today without AI? The real hurdle is getting access to materials and equipment, not the know-how.

@nsokolsky Then maybe this requires the AI to be good at giving advice on obtaining such materials. I agree that the way these are phrased is a bit dubious. Do they not further elaborate in the pdf I haven't read?

@jskf it says Allowing those with basic training to create CBRN threats is a major increase in accessibility, which requires drastic action, especially for biological and chemical risks as their material requirements are not as onerous. => which doesn't tell us whether satisfying the requirements means an actual proof-of-concept using commonly available lab equipment and materials, or whether its sufficient for the AI to print out a list of steps without an actual pathway to John Doe building a CBRN.

relatedly, are experts not already able to develop novel threats?

@Lily the criteria would make sense to me if it required an expert to implement a novel CBRN in practice and then say "no way I could've done this without GPT-7".