MANIFOLD
Will someone explain to me why modern LLMs are not trained with dropout?
6
Ṁ150Ṁ96
resolved Jun 17
Resolved
YES

Market context
Get
Ṁ1,000
to start trading!

🏅 Top traders

#TraderTotal profit
1Ṁ11
2Ṁ8
3Ṁ7
4Ṁ4
5Ṁ4
Sort by:

?


I think dropout became a lot less necessary after techniques like BatchNorm and LayerNorm started to get big. They're just all around better regularization techniques.

@jonsimon GPT-1 uses dropout. GPT-2, 3 don't mention it. BLOOM says that it doesn't use dropout explicitly, Chinchilla doesn't mention dropout or weight decay, PALM mentions weight decay but not dropout. Llama mentions weight decay but not dropout.

Don't need regularization when you only see each example once?

@NoaNabeshima Empirically degrades performance when you only see each example once?

© Manifold Markets, Inc.TermsPrivacy