
Will replacing LayerNorm with something that doesn't use current vector statistics remove outlier channels?
12
270Ṁ333Dec 2
27%
chance
1H
6H
1D
1W
1M
ALL
Will replacing LayerNorm variance and expectation with some other numbers that don't depend on the current hidden state remove outlier features in the model hidden states? Requires the replacement to not significantly degrade the final loss.
If outlier channels still exist but are at least halved in mean of absolute value, resolves at 80%.
Edit: if I think outlier channels are the tail of some smooth distribution, still can resolve Yes/No if the tail gets squashed to be much lower magnitude.
If I end up not thinking there are outlier channels, resolves N/A.
This question is managed and resolved by Manifold.
Get
1,000 to start trading!