LMSys pulled a fast one on us last time.
OpenAI's O1-preview entered the rankings at 1355 ELO, then got voted all the way down to 1335 now.
https://lmarena.ai/?leaderboard
This seems fishy perhaps, but those are the breaks.
Therefore, this time we will look for a model that enters and maintains a ranking of 1360 with 10,000 votes. Instead of looking at the first public checkpoint like we had before, we will resolve this once a model is at 1360+ ELO and at 10,000+ votes.
One caveat is we will look when the model enters the arena, in its first public posting. But resolve only if that model reaches the ELO and votes requirement.
SO, if a model enters the arena (first shows up on leaderboards on October 20th) -- but doesn't get 10,000 votes until November that will still count as October.
Sorry it's confusing but this is more intuitive. We don't want to bet on how long 10,000 votes take. But on whether a good model entered the arena and will eventually meet the requirements.
In other words, we are betting on... when will we get a release that's noticeably better than today's ~1340 ELO models. According to the LMSys voters.
Sorry we need more votes now as the confidence intervals at 3,000 votes appear not to be reliable. That or people tried to downvote O1-preview. Who knows.
@Moscow25 The latest ChatGPT model has over 10,000 votes now and is still at 1361. So I think this can resolve.
It's gonna be super close.
https://x.com/lmarena_ai/status/1859318401165930648
Latest GPT-4o is at 1361 not not over 10,000 votes yet
To be clear if it's at 1360 after 10,000+ votes update (first one) this will still count even if the update is after Nov 30th -- which it won't be anyway.
@ChrisPrichard Yes -- will be close
To be clear -- it's in the rules -- this model will count... when it gets to 10,000 votes. Even if they publish the 10,000 votes update after December 1st.
The model simply needs to appear in leaderboards by November. Which it has. This had to be done because of previous nonsense with O1.
But yes... one ELO point. Good lord.
Google's Gemini takes over top spot...
https://x.com/lmarena_ai/status/1857110672565494098
With rather pedestrian 1344 ELO.
Sad.