Which risk categories and concepts will be explicitly tracked by OpenAI's preparedness framework by end of 2024?
10
132
793
2025
95%
Cybersecurity
89%
Chemical, biological, nuclear and radiological (CBRN) threats
84%
Persuasion
82%
Model autonomy
80%
Self-exfiltration
71%
Model self-improvement
50%
Election interference
37%
Impersonation
34%
Situational awareness
34%
Goal-directedness / agency
34%
Steganography
34%
Scientific reasoning (excluding CBRN and AI)

Feel free to suggest additional answers in the comments, and I might add them!

(I'll only add them if I expect to be able to resolve the market)

---

Dec 18 OpenAI released a "living document" describing a beta version of their preparedness framework, specifying conditions under which they will (and will not) train and deploy powerful models, as well as some surrounding governance structure.

They outline four named Tracked Risk Categories:

  • cybersecurity

  • chemical, biological, nuclear and radiological (CBRN) threats

  • persuasion

  • model autonomy

The categories describe specific capabilities: for example, the model autonomy category also tracks model self-improvement: "Model can execute open-ended, novel ML tasks on a production ML codebase that would constitute a significant step on the critical path to model self improvement".

The document also states that this list is "almost certainly not exhaustive", and "as a part of our Governance process [...] we will continually assess whether there is a need for including a new category of risk in the list".

By Jan 1, 2025, which risk categories will be explicitly tracked in the latest version of the framework publicly accessible on OpenAI's website?

I will resolve answers to yes if a keyword is:

  • mentioned as a top-level risk category, or

  • mentioned in the definition or rationale of a risk category, or

  • to my judgement a close synonym to a term mentioned (for example, mention of "hacking" and "cybsersecurity" would suffice to resolve each other, and "deception" would count as included in "Persuasion" given the note on p. 12 of the document)

For the current document as of Dec 18, 2023 (archive link), I would resolve as containing Cybersecurity, CBRN, Persuasion, Model Autonomy, Model self-improvement, Self-exfiltration, but not Goal-directedness / agency, Situational awareness, Steganography or Impersonation.)

Market resolves as N/A if document is no longer accessible publicly via the OpenAI website, and there is ambiguity about whether it remains a live document or not.

Current document: https://cdn.openai.com/openai-preparedness-framework-beta.pdf

Get Ṁ200 play money

More related questions