
The Chatbot Arena Leaderboard (https://chat.lmsys.org/?arena) lists GPT-4 in the number 1 spot with an ELO of 1225.
In the number 2 spot is Claude with an ELO 1195
Will any chatbot replace GPT-4 in the number one spot before July 1, 2024?
Fine print:
If https://chat.lmsys.org/?arena ceases to function, the question may resolve on the basis of a similar site that gives ELOs for chatbots based off of real human blind side-by-side judgements.
--update--
Important update: there are now multiple "GPT-4" models on the leaderboard. In order for this question to resolve positive, the top-scoring model must have a different name (e.g. Claude) or number (e.g. 4.5). Significantly, GPT-4-turbo scoring higher than GPT-4-1106-preview will not cause this question to resolve positive.
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ290 | |
2 | Ṁ124 | |
3 | Ṁ115 | |
4 | Ṁ98 | |
5 | Ṁ49 |
People are also trading
@Shump I'm torn on this, because my original assumption when writing this question was that GPT-4 was "one thing" (elo would not change). But there are now multiple GPT-4's with different elos.
I'm going to go ahead and say "must have a different number", for example 4.5 so gpt-4-turbo still counts as "GPT-4" for the purpose of this question.
If enough people object strongly that they weren't counting on it being ruled this way, I will resolve ambiguously.
Does this resolve positive if any chatbot scores higher at any time until July 1, 2024? Or does it just resolve according to the ranking of the leaderboard at this time?
@TobiasHaeberli I assuming the score for GPT-4 is a fixed value. So if for example GPT4.5 is released with an higher ELO, then it resolves positive.
@ShadowyZephyr That indeed appears to be the case. Therefore, if at any moment in time a model other than GPT-4 takes the #1 spot, this question will resolve "yes". (this could happen because GPT-4 is getting worse).
.