Will replacing LayerNorm with something that doesn't use current vector statistics remove outlier channels?
12
270Ṁ333
Dec 2
27%
chance

Will replacing LayerNorm variance and expectation with some other numbers that don't depend on the current hidden state remove outlier features in the model hidden states? Requires the replacement to not significantly degrade the final loss.

If outlier channels still exist but are at least halved in mean of absolute value, resolves at 80%.

Edit: if I think outlier channels are the tail of some smooth distribution, still can resolve Yes/No if the tail gets squashed to be much lower magnitude.

If I end up not thinking there are outlier channels, resolves N/A.

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy