
Will someone explain to me why modern LLMs are not trained with dropout?
6
150Ṁ96resolved Jun 17
Resolved
YES1H
6H
1D
1W
1M
ALL
This question is managed and resolved by Manifold.
Get
1,000 to start trading!
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ11 | |
2 | Ṁ8 | |
3 | Ṁ7 | |
4 | Ṁ4 | |
5 | Ṁ4 |
People are also trading
Will LLMs mostly overcome the Reversal Curse by the end of 2025?
72% chance
Will one of the major LLMs be capable of continual lifelong learning (learning from inference runs) by EOY 2025?
26% chance
Will there be major breakthrough in LLM Continual Learning before 2026?
25% chance
Will an LLM that someone is trying to shut down stop or avoid that in some way before 2026?
14% chance
How will the data shortage for LLM gets solved
By 2025 end, will it be generally agreed upon that LLM produced text/code > human text/code for training LLMs?
11% chance
Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?
50% chance
Will LLMs be the best reasoning models on these dates?
Will relaxed adversarial training be used in practice for LLM alignment or auditing before 2028?
79% chance
Will there be a state-of-the-art LLM that is NOT based on next raw token prediction before 2029?
50% chance
Sort by:
@jonsimon Oh wait, I take it back, I think Transformers do actually use dropout, it's just buried in the implementation details. https://stats.stackexchange.com/a/545413
@jonsimon GPT-1 uses dropout. GPT-2, 3 don't mention it. BLOOM says that it doesn't use dropout explicitly, Chinchilla doesn't mention dropout or weight decay, PALM mentions weight decay but not dropout. Llama mentions weight decay but not dropout.
People are also trading
Related questions
Will LLMs mostly overcome the Reversal Curse by the end of 2025?
72% chance
Will one of the major LLMs be capable of continual lifelong learning (learning from inference runs) by EOY 2025?
26% chance
Will there be major breakthrough in LLM Continual Learning before 2026?
25% chance
Will an LLM that someone is trying to shut down stop or avoid that in some way before 2026?
14% chance
How will the data shortage for LLM gets solved
By 2025 end, will it be generally agreed upon that LLM produced text/code > human text/code for training LLMs?
11% chance
Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?
50% chance
Will LLMs be the best reasoning models on these dates?
Will relaxed adversarial training be used in practice for LLM alignment or auditing before 2028?
79% chance
Will there be a state-of-the-art LLM that is NOT based on next raw token prediction before 2029?
50% chance