Will softmax_1 solve the 'outlier features' problem in quantization?

2.7kṀ1419

resolved Jan 1

Resolved

ALL

See this blog post: https://www.evanmiller.org/attention-is-off-by-one.html, and in particular this paragraph:

Even though softmax1 is facially quite boring, I’m 99.44% sure that it will resolve the outlier feedback loop that’s making quantization the subject of cascades of research. If you want to run some experiments and prove me right, DM me on Twitter and we’ll get a paper going.