If training SAEs with Anthropic's new training techniques [1] gives equally interpretable features, at the majority of sites in Pythia-2.8B / Gemma-7B / Mistral-7B (whichever actually gets benchmarked), this resolves yes.
Complete methodology for evaluating this question: https://arxiv.org/abs/2404.16014. Resolves yes if anyone ever does this and gets p=0.1 significant results (we found p=0.05 actually quite hard to get without a lot of samples).
I haven't implemented [1] yet so have no insider information, and also I will not trade in this market besides an initial bet.
[1]: https://transformer-circuits.pub/2024/april-update/index.html#training-saes
RESOLUTION:
No direct calculation but https://transformer-circuits.pub/2024/june-update/index.html suggests yes as the April updates are applied to some SAEs (e.g. Gated SAEs) and they are some of the most interpretable SAEs. https://arxiv.org/abs/2407.14435 finds similar
🏅 Top traders
| # | Trader | Total profit |
|---|---|---|
| 1 | Ṁ4 |