Will someone explain to me why modern LLMs are not trained with dropout?
6
207
Ṁ97Ṁ150
resolved Jun 17
Resolved
YES1D
1W
1M
ALL
Get Ṁ200 play money
Related questions
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ11 | |
2 | Ṁ8 | |
3 | Ṁ7 | |
4 | Ṁ4 | |
5 | Ṁ4 |
Sort by:
@jonsimon Oh wait, I take it back, I think Transformers do actually use dropout, it's just buried in the implementation details. https://stats.stackexchange.com/a/545413
@jonsimon GPT-1 uses dropout. GPT-2, 3 don't mention it. BLOOM says that it doesn't use dropout explicitly, Chinchilla doesn't mention dropout or weight decay, PALM mentions weight decay but not dropout. Llama mentions weight decay but not dropout.
More related questions
Related questions
Will AI be able to solve confusing but elementary geometric reasoning problems in 2024?
49% chance
In 2028, will Gary Marcus still be able to get LLMs to make egregious errors?
43% chance
Will second-order optimizers displace first-order optimizers for training LLMs by 2030?
45% chance
By 2027, will it be generally agreed upon that LLM produced text > human text for training LLMs?
63% chance
At the beginning of 2028, will LLMs still make egregious common-sensical errors?
68% chance
Will the leading LLM at the beginning of 2026 still be subject to the reversal curse?
49% chance
By 2025 end, will it be generally agreed upon that LLM produced text/code > human text/code for training LLMs?
25% chance
Will LLM training costs fall 1,000x by 2028?
49% chance
How will the data shortage for LLM gets solved
Will AGI be interpretable due to CoT and reflection and similar methods?
21% chance