In 2025, category distribution for solved problems from the 200 Concrete Open Problems in Mechanistic Interpretability? | Manifold

In 2025, category distribution for solved problems from the 200 Concrete Open Problems in Mechanistic Interpretability?

Basic

6

Ṁ296

Jan 1

14%

Toy language models

27%

Circuits in the wild

17%

Interpreting algorithmic problems

6%

Polysemanticity and superposition

7%

Analyzing training dynamics

11%

Tooling and automation

7%

Image model interpretability

5%

Reinforcement learning interpretability

5%

Learned features in large language models

The 200 Concrete Open Problems in Mechanistic Interpretability is a list of 200 concrete research questions in neural net interpretability, proposed in December 2022 by Neel Nanda. (A centralized table of all problems is available on this Google Sheet and this Coda document.) The problems are divided into the following categories (which I've decapitalized for readability):

Toy language models
Circuits in the wild
Interpreting algorithmic problems
Polysemanticity and superposition
Analyzing training dynamics
Tooling and automation
Image model interpretability
Reinforcement learning interpretability
Learned features in language models

This market resolves MULTI to the distribution of categories for problems solved before January 1, 2025. I plan to use the Coda document to resolve this market (if it goes down or becomes obviously untrustworthy, I'll use the Google Sheet as a backup). If there's no way I can find out the category distribution, or if human civilization falls in the meantime, then this market resolves N/A.

To make New Years' Day 2025 more interesting, this market will close and resolve 32 minutes after midnight EST.

EDIT: switching to 32 minutes to increase the gap, and EST since that'll be my actual timezone

EDIT 2: completing incomplete sentence

This question is managed and resolved by Manifold.

#️ Technology

#️ AI Alignment

#Mechanistic interpretability

Get

1,000

and

3.00

Sort by:

The grokking thing would fall within "Analyzing training dynamics"?

@mariopasquato yes, question 5.4

Related questions

By 2025, percent of 200 Concrete Open Problems in Mechanistic Interpretability solved?

Will mechanistic interpretability be essentially solved for GPT-2 before 2030?

Will this project in mechanistic interpretability make me happy by the end of 2024?

In 2029, will any AI be able to take an arbitrary proof in the mathematical literature and translate it into a form suitable for symbolic verification? (Gary Marcus benchmark #5)

Will any Millenium Prize Problem (other than the Poincaré conjecture) be solved by 2030?

Will any AI be able to explain formal language proofs to >=50% of IMO problems by the start of 2025?

Will any AI be able to formalize >=90% of IMO problems by the start of 2025?

Will mechanistic interpretability be essentially solved for the human brain before 2040?

Will mechanistic interpretability have more academic impact than representation engineering by the end of 2025?

Will "How useful is mechanistic interpretability?" make the top fifty posts in LessWrong's 2023 Annual Review?

Related questions

By 2025, percent of 200 Concrete Open Problems in Mechanistic Interpretability solved?

Will any AI be able to explain formal language proofs to >=50% of IMO problems by the start of 2025?

Will mechanistic interpretability be essentially solved for GPT-2 before 2030?

Will any AI be able to formalize >=90% of IMO problems by the start of 2025?

Will this project in mechanistic interpretability make me happy by the end of 2024?

Will mechanistic interpretability be essentially solved for the human brain before 2040?

In 2029, will any AI be able to take an arbitrary proof in the mathematical literature and translate it into a form suitable for symbolic verification? (Gary Marcus benchmark #5)

Will mechanistic interpretability have more academic impact than representation engineering by the end of 2025?

Will any Millenium Prize Problem (other than the Poincaré conjecture) be solved by 2030?

Will "How useful is mechanistic interpretability?" make the top fifty posts in LessWrong's 2023 Annual Review?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules