Which risk categories and concepts will be explicitly tracked by OpenAI's preparedness framework by end of 2024?

Ṁ1kṀ540

resolved Jan 3

Resolved

YES

Cybersecurity

Resolved

YES

Chemical, biological, nuclear and radiological (CBRN) threats

Resolved

YES

Persuasion

Resolved

YES

Model autonomy

Resolved

YES

Model self-improvement

Resolved

YES

Self-exfiltration

Resolved

YES

Election interference

Resolved

Situational awareness

Resolved

Goal-directedness / agency

Resolved

Steganography

Resolved

Impersonation

Resolved

Scientific reasoning (excluding CBRN and AI)

Feel free to suggest additional answers in the comments, and I might add them!

(I'll only add them if I expect to be able to resolve the market)

---

Dec 18 OpenAI released a "living document" describing a beta version of their preparedness framework, specifying conditions under which they will (and will not) train and deploy powerful models, as well as some surrounding governance structure.

They outline four named Tracked Risk Categories:

cybersecurity
chemical, biological, nuclear and radiological (CBRN) threats
persuasion
model autonomy

The categories describe specific capabilities: for example, the model autonomy category also tracks model self-improvement: "Model can execute open-ended, novel ML tasks on a production ML codebase that would constitute a significant step on the critical path to model self improvement".

The document also states that this list is "almost certainly not exhaustive", and "as a part of our Governance process [...] we will continually assess whether there is a need for including a new category of risk in the list".

By Jan 1, 2025, which risk categories will be explicitly tracked in the latest version of the framework publicly accessible on OpenAI's website?

I will resolve answers to yes if a keyword is:

mentioned as a top-level risk category, or
mentioned in the definition or rationale of a risk category, or
to my judgement a close synonym to a term mentioned (for example, mention of "hacking" and "cybsersecurity" would suffice to resolve each other, and "deception" would count as included in "Persuasion" given the note on p. 12 of the document)

For the current document as of Dec 18, 2023 (archive link), I would resolve as containing Cybersecurity, CBRN, Persuasion, Model Autonomy, Model self-improvement, Self-exfiltration, but not Goal-directedness / agency, Situational awareness, Steganography or Impersonation.)

Market resolves as N/A if document is no longer accessible publicly via the OpenAI website, and there is ambiguity about whether it remains a live document or not.

Current document: https://cdn.openai.com/openai-preparedness-framework-beta.pdf

Market context

OpenAI

AI Safety

Get

1,000

to start trading!