
Will someone explain to me why modern LLMs are not trained with dropout?
6
Ṁ150Ṁ96resolved Jun 17
Resolved
YES1H
6H
1D
1W
1M
ALL
This question is managed and resolved by Manifold.
Market context
Get
1,000 to start trading!
🏅 Top traders
| # | Trader | Total profit |
|---|---|---|
| 1 | Ṁ11 | |
| 2 | Ṁ8 | |
| 3 | Ṁ7 | |
| 4 | Ṁ4 | |
| 5 | Ṁ4 |
People are also trading
Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?
14% chance
Will there by a major breakthrough in LLM continual learning before 2027?
45% chance
Will there be any major breakthrough in LLM continual learning before 2028?
75% chance
Will there be any major breakthrough in LLM continual learning before 2029?
87% chance
Will relaxed adversarial training be used in practice for LLM alignment or auditing before 2028?
79% chance
Will there be a state-of-the-art LLM that is NOT based on next raw token prediction before 2029?
55% chance
Will someone train an LLM using a dataset that has had all references to consciousness removed?
25% chance
Will there be any major breakthrough in LLM continual learning before 2030?
89% chance
By 2029 end, will it be generally agreed upon that LLM produced text/code > human text/code for training LLMs?
77% chance
By 2027, will it be generally agreed upon that LLM produced text > human text for training LLMs?
62% chance
Sort by:
@jonsimon Oh wait, I take it back, I think Transformers do actually use dropout, it's just buried in the implementation details. https://stats.stackexchange.com/a/545413
@jonsimon GPT-1 uses dropout. GPT-2, 3 don't mention it. BLOOM says that it doesn't use dropout explicitly, Chinchilla doesn't mention dropout or weight decay, PALM mentions weight decay but not dropout. Llama mentions weight decay but not dropout.
People are also trading
Related questions
Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?
14% chance
Will there by a major breakthrough in LLM continual learning before 2027?
45% chance
Will there be any major breakthrough in LLM continual learning before 2028?
75% chance
Will there be any major breakthrough in LLM continual learning before 2029?
87% chance
Will relaxed adversarial training be used in practice for LLM alignment or auditing before 2028?
79% chance
Will there be a state-of-the-art LLM that is NOT based on next raw token prediction before 2029?
55% chance
Will someone train an LLM using a dataset that has had all references to consciousness removed?
25% chance
Will there be any major breakthrough in LLM continual learning before 2030?
89% chance
By 2029 end, will it be generally agreed upon that LLM produced text/code > human text/code for training LLMs?
77% chance
By 2027, will it be generally agreed upon that LLM produced text > human text for training LLMs?
62% chance