Will someone train a 1T parameter dense (non-sparsely activated mixture of experts) language model this year?
124
1kṀ27k
resolved Apr 2
Resolved
NO

There's a news report, paper, or blog post that I consider trustworthy reporting on a 1T parameter dense language model that was trained before January 1st, 2023.

Close date updated to 2022-12-31 11:59 pm Jun 16, 10:15pm: Since there's been some confusion in the comments, I want to clarify that "dense" here is in contrast to sparse models like mixture-of-experts. I've clarified more in a comment below, from which I've copied the relevant snippet:

> It's not "transformer" vs. not either though, it's specifically focused on the distinction between mixture of experts model (ex: https://arxiv.org/abs/1701.06538) which are sparsely activated vs. dense models which use all their nodes for each forward pass.

EDIT: In the comments, Lauro Langosco di Langosco pointed out that it's possible we won't know as of the end of the year whether someone trained a model with 1T params in 2022 and just hadn't announced it yet. He also noted that the current phrasing of the question is very focused on trained vs. announced. Given that, my compromise here is to wait 3 months after the start of 2023 to resolve this rather than doing it right away. If noone announces something they trained with 1T parameters by then, I'll resolve "No".

EDIT 2: Lauro pushed back on three months as being too short, so I'll wait another year to resolve.

EDIT 3: After further deliberation, I've decided to stick with 3 months after 01/01/2023 for resolution. Another clarification I wanted to make is I won't definitely resolve positively unless the 1T model seems at least competitive with existing SoTA models. As mentioned in the comments, if someone trains a 1T parameter dense model as good as say davinci-002 or something, I'll have to decide what I'm going to do. I don't think that's likely enough to pre-plan for though.

Dec 12, 10:40am: Will someone train a 1T parameter dense language model model this year? → Will someone train a 1T parameter dense (non-sparsely activated mixture of experts) language model this year?

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ877
2Ṁ671
3Ṁ421
4Ṁ240
5Ṁ200
© Manifold Markets, Inc.TermsPrivacy