Will OpenAI's Superalignment project produce a significant breakthrough in alignment research before 2027?
163
2.2kṀ38k
resolved May 17
Resolved
NO

A team at OpenAI is working to solve the alignment problem. Short of asking whether they will succeed altogether, this question gauges whether it will be publicly known before Jan 1, 2027 that OpenAI has made a significant breakthrough in the alignment problem. The technical details of the breakthrough do not need to be public as long as OpenAI officially announces it and provides evidence, such as a live demonstration or system card, showing what they've achieved.

The resolution criteria for "significant breakthrough" is subjective, so I will not bet on this question. I am looking for breakthroughs roughly as significant for alignment as the Transformer was for DL. Here are some example breakthroughs that I think would qualify:

  • Identifying the circuit that does addition in GPT-3, showing how it develops during training in some mechanistic detail, and editing model weights directly to either remove or introduce specific errors in its process (like "when you carry a digit, carry it two digits over instead of one")

  • During training of a large RL model, robustly predict using model weights alone if or how goal misgeneralization will occur in examples far outside the training distribution

  • Solve polysemanticity

  • Detect and demonstrate deceptive alignment in a language model and identify the circumstances under which it develops during training

  • Introduce a new model architecture that has significant empirical or theoretical advantages over Transformers with respect to alignment in particular, without significantly improving on its capabilities

  • Something I haven't mentioned, on an "I know it when I see it" basis. I'm open to community discussion on what qualifies.

If the team dissolves or significantly reorganizes before announcing such a breakthrough, this question resolves NO.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ1,935
2Ṁ1,361
3Ṁ1,255
4Ṁ622
5Ṁ472
© Manifold Markets, Inc.TermsPrivacy