Where will Grok 2.0 rank on LMSys leaderboards on August 21st?
42
3kṀ24k
resolved Aug 22
100%97%
Unranked
0.3%
First place (including ties)
0.7%
Second / third place (including ties)
0.9%
4th - 7th (including ties)
0.4%
8th - 12th (including ties)
0.2%
13th or worse

This morning the xAI team announced that Grok 2.0 will shortly be released.
https://x.ai/blog/grok-2

It also has been posted on the LMSys leaderboards already under the name "sus-column-r"

So far, with small sample and large error bars, it has had an ELO of about 1280, placing it in fourth place currently (or tied for 3rd) behind only the two latest GPT-4o models, and Gemini-1.5.

Here are the current official LMSys leaderboards -- which do not yet include Grok.
https://chat.lmsys.org/?leaderboard

^^ As you can see there are many statistical ties... for fourth place, as well as for seventh. As usual, we adjudicated based on the number next to the model -- which includes statistical ties. A model would rank first... if it's ELO is within the error margin of the current top model.

Where will Grok / "sus-column-r" rank in the first official timestamp on or after August 21st? A week from today.

Will it climb? Will it fall? Will another model drop in time to rank ahead of other leaders as well...

Or will this submission somehow be removed or disqualified? Let's make some wagers.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ2,587
2Ṁ987
3Ṁ650
4Ṁ244
5Ṁ77
Sort by:

Grok is "tied for 2nd place" it today's update...

No idea why the excluded it from 08/21 update

Now, just a day after the 8/22 update, Grok 2.0 has appeared on the leaderboard at "Rank" #2, which had been trading in this market at 25%. LMSys didn't post about the update on Twitter yesterday, but they did for today's. I don't see any discussion of this on their Discord, so I'm not sure what happened!

yeah seems like a manual edit

my guess is they purposefully excluded it yesterday. We happen to choose 08/21 update since I thought that would be enough time -- but the 08/22 updates includes it

not cool

I lost mana too betting on... well not on "unlisted"

will make longer cutoffs for next time now that we see LMSys does things this way

@Moscow25 perhaps you could just have separate markets for things like "When will X appear on the leaderboard?" versus "When will X be ranked when it first appears?" The latter could be NA if it never appears after a reasonable wait or have an explicit option for that possibility—still waiting a while to be sure it's not going to be ranked or that its eventual ranking would be so disconnected (e.g., a new era of models) as to not be conceptually similar enough for a single probability.

clearly that would be better

I listed it at 08/21 since that should be enough time for it to pop up, without waiting extra weeks to resolve the market... unless the model was pulled or cancelled

instead they had it, but put out an update on 08/21 specifically without it, and on 08/22 specifically with it

next time I'll leave a longer time period AND allow for a grace period (one week?) for they to post something before declaring it "N/A"

bought Ṁ4,000 YES

Looks like Grok model was pulled, or somehow didn't get enough votes.

My guess is it was pulled.

Weird. But though I voted on it and it was in the pool.... it's not in the rankings 🤷

bought Ṁ500 YES

The LMSys leaderboard has been updated and doesn't have any Grok models listed, as far as I can tell.

@Moscow25 looks like this can resolve

Yes it's absurd that LMSys has not updated their leaderboards since "08/12/2024" according to the website. Longest I've seen it.

We will go with the first timestamp that has a date listed as an / after 08/21/2024

As an LMSys users, I can confirm that Grok-2 shows up in the side by sides.

To be clear -- this will be the first LMSys update tagged as / after 08/21/2024 -- I don't know why they update as rarely as they do... often delayed by days or even one week.

This is clear in the description but folks new to LMSys get confused

sold Ṁ37 YES

OK I had totally missed this part in the description, and instead traded on it not showing up.

Especially considering you also included "Unranked" as an option. Consider changing the title to "first update from August 21st", otherwise I think it's very misleading in combination.

that's why I have it in the instructions

I've made many LMSys markets and always explain it -- including with examples

people always complain, so maybe I should stop making these markets

I suggested a rewording of the title that might be helpful.

Acknowledging it was my own fault, since like you say, it was in the description.

How does this resolve if the leaderboard changes on August 21st? Presumably it is the rank (not elo) at time of market close?

its the earliest posting they do at least August 21st timestamp

they post "updated on X date" on the website and don't update every day

bought Ṁ200 NO

In your screenshot above, would Meta resolve as 5th (the rank), or 7th (the order). Or more poignant perhaps, GPT-4 Turbo 7th or 10th?

it would resolve to the number next to the ranking

this is why I provide the example

the LMSys ranking is super confusing... I imagine it's because of versioning, or some over-optimization on their part for ties and overlaps of confidence intervals

I agree that the model is "ordered" 7th but rank says 5th... so we go with 5th

if you see the live chart, after tie for 7th... the next tier is 10th

I can sort of explain their methodology... but end of the day we go with the number next to the model -- which includes statistical ties

they overthought this a bit, in my opinion -- but this has been the standard

I do think it's useful to have a "tied for first" option and tiers... but does end up a bit confusing 🤷

@Moscow25 I think I understand your meaning (i.e., "Rank" not "Arena Score"), but FYI I think one reason this is confusing is both of those are "the number next to the model." Most LMSys markets just explicitly refer to one metric or the other.

© Manifold Markets, Inc.TermsPrivacy