Will OpenAI release weights to a model designed to be easily interpretable (2024)?

2.1kṀ33k

resolved Jan 16

Resolved

ALL

This market predicts whether OpenAI will provide the weights to a language model specifically designed for easy interpretability(an Interpretability Model) to any Alignment-focused Organization by December 31, 2024:

Resolves YES if:

OpenAI officially announces or confirms that they have shared the weights of an interpretability model with any alignment-focused organization on or before December 31, 2024.

Resolves 50% if:

There is credible evidence that OpenAI has shared the weights of an interpretability model with any alignment-focused organization. Then the market will be held open for an additional 6 months(June 1, 2025) and resolved 50%.

Resolves NO if:

No alignment-focused organization receives the weights of an interpretability model from OpenAI by December 31, 2024.

Resolves as NA if:

OpenAI ceases to exist, or the listed organizations merge, dissolve, or undergo significant restructuring, rendering the original intent of the market unclear or irrelevant.

Definitions:

A language model is an algorithm that processes and generates human language by assigning probabilities to sequences of tokens (words, characters, or subword units) based on learned patterns from training data. They can then be used for various natural language processing tasks, such as text prediction, text generation, machine translation, sentiment analysis, and more. Language models use statistical or machine learning methods, including deep learning techniques like recurrent neural networks (RNNs), long short-term memory (LSTM) networks, and transformer architectures, to capture the complex relationships between words and phrases in a language.
"Interpretability model" refers to a language model created by OpenAI with the primary goal of facilitating better understanding of the inner workings, decision-making processes, and learning mechanisms of the model, as opposed to models optimized solely for performance or prediction accuracy. An accompanying article should emphasize interpretability, and an accompanying technical report should report interpretability benchmarks that can judge other language models. Model must not have been released before this market's creation. While performance is not the primary goal, it must be competitive on benchmarks with language models at most 2 years behind it (this excludes the possibility of e.g. a Markov Chain being presented).
"OpenAI" mainly means OpenAI, but additionally any executive or heavy investor is allowed to make the announcement. So if Microsoft makes the announcement, it will count. If OpenAI is acquired, the name refers to the acquirer. If OpenAI dissolves, this market resolves NA.
"Credible evidence" means journalistic reporting alluding to credible sources, statements by employees or former employees, or similar.
"Alignment-focused Organization" refers to Anthropic, Redwood Research, Alignment Research Center, Center for Human Compatible AI, Machine Intelligence Research Institute, or Conjecture. Additional examples may be added - if there is a dispute, a poll may be taken. A public release of weights counts as releasing to these organizations. A leaked release also counts, as long as the model is confirmed to have been developed with the purpose fo being interpretable. "Red-teaming" of capabilities does not count.
Market description can be freely adjusted within one week after market creation. After that, I will only refine to narrower meanings or to have additional examples added.

Technical AI Timelines

OpenAI

AI Safety

AI Alignment

Mechanistic interpretability

Language Models

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ1,844
2		Ṁ852
3		Ṁ716
4		Ṁ665
5		Ṁ181

People are also trading

Will OpenAI offer a model that updates its weights while running during 2025?

9% chance

Will OpenAI allow near full access to the weights of their best-trained model to an external auditor by the end of 2030?

60% chance

What will be true of OpenAI’s open-weight model?

Will a flagship (>60T training bytes) open-weights LLM from Meta which doesn't use a tokenizer be released in 2025?

20% chance

By 2030, can we convert at least 10% of an AI's weights to C code, enhancing interpretability?

40% chance

When will OpenAI release a robotics model?

Will Meta censor its future open weights models according to Chinese-developed techniques?

Sort by:

@mods deleted account needs resolving.

https://github.com/openai/transformer-debugger
Doesn't look to be a model, but a tool to debug models.

sold Ṁ49 YES

https://github.com/openai/sparse_autoencoder
does this count?

predictedYES

Just to confirm, if it happens in 2023, it would count, yes?

predictedYES

@firstuserhere yes

predictedNO

[looking for feedback]

For this to happen resolve YES:
1. There is some way to create a model optimized for interpretability
1.1. That is not a low hanging fruit (since otherwise someone working on interpretability would find it already)
1.2. OpenAI would invest a lot to find this model
2. OpenAI are comfortable releasing more models
2.1. They don't think this would advance other's capabilities in a way that would risk OpenAI's business or AI Safety
3. OpenAI keeps focusing on language models

This seems unlikely (but I'm new here, happy for feedback)

People are also trading

Will OpenAI offer a model that updates its weights while running during 2025?

9% chance

Will OpenAI allow near full access to the weights of their best-trained model to an external auditor by the end of 2030?

60% chance

What will be true of OpenAI’s open-weight model?

Will a flagship (>60T training bytes) open-weights LLM from Meta which doesn't use a tokenizer be released in 2025?

20% chance

By 2030, can we convert at least 10% of an AI's weights to C code, enhancing interpretability?

40% chance

When will OpenAI release a robotics model?

Will Meta censor its future open weights models according to Chinese-developed techniques?

32% chance

🏅 Top traders

People are also trading

People are also trading

Related questions