
This morning the xAI team announced that Grok 2.0 will shortly be released.
https://x.ai/blog/grok-2
It also has been posted on the LMSys leaderboards already under the name "sus-column-r"
So far, with small sample and large error bars, it has had an ELO of about 1280, placing it in fourth place currently (or tied for 3rd) behind only the two latest GPT-4o models, and Gemini-1.5.

Here are the current official LMSys leaderboards -- which do not yet include Grok.
https://chat.lmsys.org/?leaderboard

^^ As you can see there are many statistical ties... for fourth place, as well as for seventh. As usual, we adjudicated based on the number next to the model -- which includes statistical ties. A model would rank first... if it's ELO is within the error margin of the current top model.
Where will Grok / "sus-column-r" rank in the first official timestamp on or after August 21st? A week from today.
Will it climb? Will it fall? Will another model drop in time to rank ahead of other leaders as well...
Or will this submission somehow be removed or disqualified? Let's make some wagers.
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ2,587 | |
2 | Ṁ987 | |
3 | Ṁ650 | |
4 | Ṁ244 | |
5 | Ṁ77 |
People are also trading
Now, just a day after the 8/22 update, Grok 2.0 has appeared on the leaderboard at "Rank" #2, which had been trading in this market at 25%. LMSys didn't post about the update on Twitter yesterday, but they did for today's. I don't see any discussion of this on their Discord, so I'm not sure what happened!
yeah seems like a manual edit
my guess is they purposefully excluded it yesterday. We happen to choose 08/21 update since I thought that would be enough time -- but the 08/22 updates includes it
not cool
I lost mana too betting on... well not on "unlisted"
will make longer cutoffs for next time now that we see LMSys does things this way
@Moscow25 perhaps you could just have separate markets for things like "When will X appear on the leaderboard?" versus "When will X be ranked when it first appears?" The latter could be NA if it never appears after a reasonable wait or have an explicit option for that possibility—still waiting a while to be sure it's not going to be ranked or that its eventual ranking would be so disconnected (e.g., a new era of models) as to not be conceptually similar enough for a single probability.
clearly that would be better
I listed it at 08/21 since that should be enough time for it to pop up, without waiting extra weeks to resolve the market... unless the model was pulled or cancelled
instead they had it, but put out an update on 08/21 specifically without it, and on 08/22 specifically with it
next time I'll leave a longer time period AND allow for a grace period (one week?) for they to post something before declaring it "N/A"
it would resolve to the number next to the ranking
this is why I provide the example
the LMSys ranking is super confusing... I imagine it's because of versioning, or some over-optimization on their part for ties and overlaps of confidence intervals
I agree that the model is "ordered" 7th but rank says 5th... so we go with 5th
if you see the live chart, after tie for 7th... the next tier is 10th
I can sort of explain their methodology... but end of the day we go with the number next to the model -- which includes statistical ties
they overthought this a bit, in my opinion -- but this has been the standard
I do think it's useful to have a "tied for first" option and tiers... but does end up a bit confusing 🤷
@Moscow25 I think I understand your meaning (i.e., "Rank" not "Arena Score"), but FYI I think one reason this is confusing is both of those are "the number next to the model." Most LMSys markets just explicitly refer to one metric or the other.