Will anyone train a 50B parameter+ RetNet by the end of 2023?
Basic
11
Ṁ1477
resolved Jan 3
Resolved
NO

[RetNet paper](https://arxiv.org/abs/2307.08621). They claim some pretty cool

results in the small model range (up to 6.7B parameters). Will anyone attempt to generalize that to a large model?

Get
Ṁ1,000
and
S3.00

🏅 Top traders

#NameTotal profit
1Ṁ100
2Ṁ43
3Ṁ37
4Ṁ14
5Ṁ9
Sort by:

From @kipply here:

An attempt at a new architecture, but immediately opens with a plot showing they couldn’t scale their baseline transformer properly, an inspirational quote and an impossible triangle that is used as a diagram?

This makes me think that it’s not that promising.

There’s also this theory that someone floated on Twitter that it’s suspicious that they stopped scaling at 7B which happens to be when outlier features appear, I’m confused about why that’s relevant.

I would love me some Pytorch code that runs that ha ha. GCP can offer you compute to investigate for free, if you apply.

But it is a waste of money and CO2 emissions to train it unless you have really good data and "LLMOps" (I shouldn't say that lol as I am betting NO on that market)

One issue is that if OpenAI decides RetNet is the future, it isn't like they'll tell anyone :|.

So, I don't know if you're planning on resolving this at the end of 2023 or later, but there's an arg for later.

@1a3orn Sadly true :/

@1a3orn To follow up on this: I'm happy to wait a month or two for additional info to come out, but won't wait longer than that.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules