Will we reverse-engineer a language model into an interpretable (python) program by 2027?
Will we reverse-engineer a language model into an interpretable (python) program by 2027?
31
1kṀ2836
2027
8%
chance

One of the most ambitious goals of mechanistic interpretability would be achieved if we could train a neural network and then distill it into an interpretable algorithm that closely resembles the intermediate computations done by the model.

Some argue this is unlikely to happen (e.g. https://www.lesswrong.com/posts/d52aS7jNcmi6miGbw/take-1-we-re-not-going-to-reverse-engineer-the-ai) , while others are trying to make this happen.

In order for the market to resolve to YES, a model as least as capable as llama2-7B needs to be distilled into python code that can be understood and edited by humans, and this distilled version of the model must perform at least 95% as good as the original model on every benchmark except adversarially constructed ones that specifically highlight the differences between the distilled and the original model.

Get
Ṁ1,000
to start trading!


Sort by:
predictedNO

I think it's plausible~probable we'll reverse-engineer largeish (>=100M parameters) models into understandable/editable python programs but I don't think we'll get near-original model performance with these programs.

1y

@NoaNabeshima I think if we are able to reverse engineer 100M models, this process would become automatable using AI at some point and then we would probably also be able to reverse engineer larger models.

For what performance threshold would give a 50% chance that reverse engineering would succeed?

predictedNO 1y

@NielsW I have no idea, I'll stew on it

What is this?

What is Manifold?
Manifold is the world's largest social prediction market.
Get accurate real-time odds on politics, tech, sports, and more.
Or create your own play-money betting market on any question you care about.
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like betting still use Manifold to get reliable news.
ṀWhy use play money?
Mana (Ṁ) is the play-money currency used to bet on Manifold. It cannot be converted to cash. All users start with Ṁ1,000 for free.
Play money means it's much easier for anyone anywhere in the world to get started and try out forecasting without any risk. It also means there's more freedom to create and bet on any type of question.
© Manifold Markets, Inc.TermsPrivacy