Will OpenAI release weights to a model designed to be easily interpretable (2024)?

2.1kṀ33k

resolved Jan 16

Resolved

ALL

This market predicts whether OpenAI will provide the weights to a language model specifically designed for easy interpretability(an Interpretability Model) to any Alignment-focused Organization by December 31, 2024:

Resolves YES if:

OpenAI officially announces or confirms that they have shared the weights of an interpretability model with any alignment-focused organization on or before December 31, 2024.

Resolves 50% if:

There is credible evidence that OpenAI has shared the weights of an interpretability model with any alignment-focused organization. Then the market will be held open for an additional 6 months(June 1, 2025) and resolved 50%.

Resolves NO if:

No alignment-focused organization receives the weights of an interpretability model from OpenAI by December 31, 2024.

Resolves as NA if:

OpenAI ceases to exist, or the listed organizations merge, dissolve, or undergo significant restructuring, rendering the original intent of the market unclear or irrelevant.

Definitions:

A language model is an algorithm that processes and generates human language by assigning probabilities to sequences of tokens (words, characters, or subword units) based on learned patterns from training data. They can then be used for various natural language processing tasks, such as text prediction, text generation, machine translation, sentiment analysis, and more. Language models use statistical or machine learning methods, including deep learning techniques like recurrent neural networks (RNNs), long short-term memory (LSTM) networks, and transformer architectures, to capture the complex relationships between words and phrases in a language.
"Interpretability model" refers to a language model created by OpenAI with the primary goal of facilitating better understanding of the inner workings, decision-making processes, and learning mechanisms of the model, as opposed to models optimized solely for performance or prediction accuracy. An accompanying article should emphasize interpretability, and an accompanying technical report should report interpretability benchmarks that can judge other language models. Model must not have been released before this market's creation. While performance is not the primary goal, it must be competitive on benchmarks with language models at most 2 years behind it (this excludes the possibility of e.g. a Markov Chain being presented).
"OpenAI" mainly means OpenAI, but additionally any executive or heavy investor is allowed to make the announcement. So if Microsoft makes the announcement, it will count. If OpenAI is acquired, the name refers to the acquirer. If OpenAI dissolves, this market resolves NA.
"Credible evidence" means journalistic reporting alluding to credible sources, statements by employees or former employees, or similar.
"Alignment-focused Organization" refers to Anthropic, Redwood Research, Alignment Research Center, Center for Human Compatible AI, Machine Intelligence Research Institute, or Conjecture. Additional examples may be added - if there is a dispute, a poll may be taken. A public release of weights counts as releasing to these organizations. A leaked release also counts, as long as the model is confirmed to have been developed with the purpose fo being interpretable. "Red-teaming" of capabilities does not count.
Market description can be freely adjusted within one week after market creation. After that, I will only refine to narrower meanings or to have additional examples added.