What Elo will gpt2-chatbot have on the LMSYS leaderboard?
resolved May 23
1271-1350 (new SOTA)
Below 1240 (not SOTA)
1241-1270 (match GPT4/Claude 3)
1351+ (generation defining - as big a gap as GPT4 v GPT3.5)

Question resolves to its Elo one week after it first appears on the leaderboard.

If [OpenAI / an employee there / whatever org created it] acknowledges this model and gives it a new name (eg, GPT4.5), will resolve to that model's Elo.

If no gpt2-chatbot model is listed on the leaderboard by June 30, will resolve N/A.


1287 Elo resolves to (the low end of) New SOTA. Thanks for predicting!

GPT-4o is on the leaderboard and per the resolution criteria, I will wait one week for the Elo to stabilize and then resolve on Thursday May 23.

Based on this tweet, expecting to resolve this to the GPT-4o Elo


That tweet greatly exaggerated the final ELO

@Uaaar33 Seriously. Though the actual Elo is way closer to my subjective experience (small improvement)

Looks like 2 new potential successors were released "im-a-good-gpt2-chatbot" and "im-also-a-good-gpt2-chatbot". If these both appear on the leaderboard, I'm inclined to use the highest rated Elo to resolve this question.

Even if unacknowledged by OpenAI, I'll count these. (resolves N/A if no successor is released)

Let me know if any concerns with that.

gpt2-chatbot was just removed from the list of Direct Models a few minutes ago. I was having a conversation with it and it gave me an error saying there was too much traffic, then was no longer an option to select when I refreshed.

@WilliamKiely Official announcement now:

@WilliamKiely I'm guessing this is why: "The public leaderboard will only include models that are accessible to other third parties."

I'm glad this question already exists. It could use more participants!

