Will we be able to estimate the feature importance curve or feature sparsity curve of real models? (2024 end)
Standard
10
Ṁ120Dec 31
62%
chance
1D
1W
1M
ALL
gpt-2 onwards. Comparable performance.
Get
1,000
and1.00
Sort by:
My friend says that "features" could mean one of several different things ("parts of the model that correlate with human concepts", "linear things that the model uses as in TMOS [Toy Models of Superposition]", "some other decomposition of the model") and that there may not actually be a real "feature importance curve".