Which risk categories and concepts will be explicitly tracked by OpenAI's preparedness framework by end of 2024?
10
1kṀ540
resolved Jan 3
Resolved
YES
Cybersecurity
Resolved
YES
Chemical, biological, nuclear and radiological (CBRN) threats
Resolved
YES
Persuasion
Resolved
YES
Model autonomy
Resolved
YES
Model self-improvement
Resolved
YES
Self-exfiltration
Resolved
YES
Election interference
Resolved
NO
Situational awareness
Resolved
NO
Goal-directedness / agency
Resolved
NO
Steganography
Resolved
NO
Impersonation
Resolved
NO
Scientific reasoning (excluding CBRN and AI)

Feel free to suggest additional answers in the comments, and I might add them!

(I'll only add them if I expect to be able to resolve the market)

---

Dec 18 OpenAI released a "living document" describing a beta version of their preparedness framework, specifying conditions under which they will (and will not) train and deploy powerful models, as well as some surrounding governance structure.

They outline four named Tracked Risk Categories:

  • cybersecurity

  • chemical, biological, nuclear and radiological (CBRN) threats

  • persuasion

  • model autonomy

The categories describe specific capabilities: for example, the model autonomy category also tracks model self-improvement: "Model can execute open-ended, novel ML tasks on a production ML codebase that would constitute a significant step on the critical path to model self improvement".

The document also states that this list is "almost certainly not exhaustive", and "as a part of our Governance process [...] we will continually assess whether there is a need for including a new category of risk in the list".

By Jan 1, 2025, which risk categories will be explicitly tracked in the latest version of the framework publicly accessible on OpenAI's website?

I will resolve answers to yes if a keyword is:

  • mentioned as a top-level risk category, or

  • mentioned in the definition or rationale of a risk category, or

  • to my judgement a close synonym to a term mentioned (for example, mention of "hacking" and "cybsersecurity" would suffice to resolve each other, and "deception" would count as included in "Persuasion" given the note on p. 12 of the document)

For the current document as of Dec 18, 2023 (archive link), I would resolve as containing Cybersecurity, CBRN, Persuasion, Model Autonomy, Model self-improvement, Self-exfiltration, but not Goal-directedness / agency, Situational awareness, Steganography or Impersonation.)

Market resolves as N/A if document is no longer accessible publicly via the OpenAI website, and there is ambiguity about whether it remains a live document or not.

Current document: https://cdn.openai.com/openai-preparedness-framework-beta.pdf

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ59
2Ṁ32
3Ṁ31
4Ṁ9
5Ṁ7
Sort by:

Looks pretty dead for a “live” document

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules