Will we be able to estimate the feature importance curve or feature sparsity curve of real models? (2024 end)
Standard
10
Ṁ120
Dec 31
62%
chance

gpt-2 onwards. Comparable performance.

Get
Ṁ1,000
and
S1.00
Sort by:

My friend says that "features" could mean one of several different things ("parts of the model that correlate with human concepts", "linear things that the model uses as in TMOS [Toy Models of Superposition]", "some other decomposition of the model") and that there may not actually be a real "feature importance curve".

predicts YES

This is great.

Related questions