Will mechanistic interpretability be essentially solved for GPT-3 before 2030?
closes 2030

Mechanistic interpretability aims to reverse engineer neural networks in a way that is analogous to reverse engineering a compiled binary computer program. Achieving this level of interpretability for a neural network like GPT-3 would involve creating a binary computer program that is interpretable by expert human programmers and can emulate the input-output behavior of GPT-3 with high accuracy.

Before January 1st, 2030, will mechanistic interpretability be essentially solved for GPT-3, resulting in a binary computer program that is interpretable by ordinary expert human programmers and emulates GPT-3's input-output behavior up to a high level of accuracy?

Resolution Criteria:

This question will resolve positively if, before January 1st, 2030, a binary computer program is developed that meets the following criteria:

  1. Interpretability: The binary computer program must be interpretable by ordinary expert human programmers, which means:
    a. The program can be read, understood, and modified by programmers who are proficient in the programming language it is written in, and have expertise in the fields of computer science and machine learning.
    b. The program is well-documented, with clear explanations of its components, algorithms, and functions.
    c. The program's structure and organization adhere to established software engineering principles, enabling efficient navigation and comprehension by expert programmers.

  2. Accuracy: The binary computer program must emulate GPT-3's input-output behavior with high accuracy, as demonstrated by achieving a maximum average of 1.0% word error rate compared to the original GPT-3 model when provided with identical inputs, setting the temperature parameter to 0. The accuracy must be demonstrated by sampling a large number of inputs from some diverse, human-understandable distribution of text inputs.

  3. Not fake: I will use my personal judgement to determine whether a candidate solution seems fake or not. A fake solution is anything that satisfies these criteria without getting at the spirit of the question. I'm trying to understand whether we will reverse engineer GPT-3 in the complete sense, not just whether someone will create a program that technically passes these criteria.

This question will resolve negatively if, before January 1st, 2030, no binary computer program meeting the interpretability and accuracy criteria is developed and verified according to the above requirements. If there is ambiguity or debate about whether a particular program meets the resolution criteria, I will use my discretion to determine the appropriate resolution.

Get Ṁ500 play money

Related questions

[M5000 subsidy] Will finetuned GPT-3.5 solve any freshly-generated Sudoku puzzle? (2023)
Mira avatarMira
47% chance
Will Google Bard become better than GPT-4 at any point before September 2024?
WyattStanke avatarWyatt Stanke
66% chance
Will mechanistic interpretability be essentially solved for GPT-2 before 2030?
MatthewBarnett avatarMatthew Barnett
30% chance
Will we train GPT-4 to generate resolution criteria better than the creator 50% of the time by the end of 2023?
CrystalBallin avatarCrystal Ballin'
30% chance
Will there be a version of GPT4 with a context window of 100k tokens this year?
SneakySly avatarSneakySly
42% chance
Will Google's Gemini outperform GPT-4 in the SuperGLUE benchmark test by December 2023?
FranklinBaldo avatarFranklin Baldo
55% chance
Will there be a OpenAI LLM known as GPT-4.5? by 2033
DylanSlagh avatarDylan Slagh
72% chance
2) We are going to start running out of data to train large language models.
MattCWilson avatarMatt C. Wilson
24% chance
Will GPT-4 be trained on more than 10T text tokens?
BionicD0LPH1N avatarBionic
23% chance
By 2024, GPTs are proven to be able to infer scientific principles from linguistic data.
Will any Google model exceed chatGPT interest? (by 2025)
Gigacasting avatarGigacasting
54% chance
Will GPT, or AI systems that have GPT as their main component, become as reliably factual as Wikipedia, before 2026?
lesaun avatarLesaun
39% chance
There will be an open source LLM approximately as good or better than GPT4 before 2025
NathanpmYoung avatarNathan Young
77% chance
Will a model be trained using at least as much compute as GPT-3 using AMD GPUs before Jan 1 2026?
LeoGao avatarLeo Gao
71% chance
Will Gary Marcus tweet at least 10 examples of GPT-4 failure which won't be disproven/fixed within 24 hours? (in 2023)
MrLuke255 avatarMrLuke255
22% chance
Will an open-source LLM beat or match GPT-4 by the end of 2024?
Elon avatarElon
62% chance
Will GPT-5 be released incrementally as GPT4.x for different checkpoints from the training run?
firstuserhere avatarfirstuserhere
37% chance
Will any open-source model achieve GPT-4 level performance on MMLU through 2024?
mattburtell avatarMatt Burtell
64% chance
Will inflection AI have a model that is 10X the size of original GPT-4 at the end of Q1, 2025?
firstuserhere avatarfirstuserhere
46% chance
Will xAI release an LLM with BIG-Bench score as good as GPT-4 before the end of 2024?
RyanKidd avatarRyan Kidd
58% chance
Sort by:
JSD avatar
JSDbought Ṁ50 of NO

Before I bet this was only one percentage point lower than the analogous question for GPT-2, which seemed wild to me.

1 reply
JSD avatar
JSDpredicts NO

@JeanStanislasDenain Hmmm maybe not completely wild. In any case I think 25% was way too high.

RyanGreenblatt avatar
Ryan Greenblattbought Ṁ70 of NO

I'm at >95% that this is literally impossible for human programmers.
It seems like this would be totally crazy for this to be possible.