Will Meta's new AI model Llama 3.1 405B rank #1 on LMSys leaderboards on August 12th?
➕
Plus
105
285k
resolved Aug 13
Resolved
NO

AI people are very excited for Meta's new Llama 405B model. It is rumored to be coming out on Tuesday July 23rd, and a copy possibly already leaked.
https://deepnewz.com/ai/meta-s-llama-3-405b-set-release-on-july-23-2024-new-feature-3-1-3-days-left


The rumors are that the model will be the first open source model "better than GPT-4."

https://deepnewz.com/ai/meta-s-llama-3-1-405b-model-leaked-on-4chan-ahead-july-23-launch-size-820-gb

Once the model is released, and has 2+ weeks to run on LMSys, where will it rank?

To be specific, we will check the first LMSys overall leaderboard table that's marked as on or after Monday August 12th. We may delay this if the model is released later than expected, but this will be updated in the rules if that is the unlikely case.

As usual with LMSys, we count statistical ties as reported on the leaderboards.

So a YES would result for a tie for first place or sole first place, presumably against the GPT-4o model that currently leads LMSys as of July 22nd.
https://chat.lmsys.org/?leaderboard

Claude 3.5 Sonnet currently ranks second, tied with Gemini...

Get Ṁ1,000 play money

🏅 Top traders

#NameTotal profit
1Ṁ5,831
2Ṁ3,263
3Ṁ1,907
4Ṁ1,502
5Ṁ1,343
Sort by:

FYI the update is still "08/06" -- not sure why LMSys is so slow on this.

But it's in fourth place, so far behind no chance the model will catch up. We should have made a market on Gemini instead.

resolves no

Llama 3.1 405B -- debuts tied for #3 with Gemini.

https://x.com/lmsysorg/status/1818321701052276990


Based on only 6,000 votes -- but I would say pretty unlikely it will climb 20+ ELO points to get to GPT-4o from here.

What if the model is still unranked till 12.08? Will the market resolve to no?

Yes.

I’ve tried the model days ago. It will be ranked unless it gets pulled for some reason.

Has anyone here tried the 405B model?

The model is doing well on ScaleAI's secret metrics.
https://x.com/alexandr_wang/status/1815775286195331411

As good as Claude 3.5 Sonnet on some, and even better on others

Model is out, and well received

All those fancy leaked eval numbers... LMsys raters don't care.

Let's make this a Plus market!

One question only -- sorry multi-option are fun but graphs don't look as cool.

Will Llama 405B reach number one? You decide.

Comment hidden