Will any Chatbot beat GPT-4 by July 1, 2024?
48
256
890
resolved Mar 26
Resolved
YES

The Chatbot Arena Leaderboard (https://chat.lmsys.org/?arena) lists GPT-4 in the number 1 spot with an ELO of 1225.

In the number 2 spot is Claude with an ELO 1195

Will any chatbot replace GPT-4 in the number one spot before July 1, 2024?

Fine print:

If https://chat.lmsys.org/?arena ceases to function, the question may resolve on the basis of a similar site that gives ELOs for chatbots based off of real human blind side-by-side judgements.

--update--

Important update: there are now multiple "GPT-4" models on the leaderboard. In order for this question to resolve positive, the top-scoring model must have a different name (e.g. Claude) or number (e.g. 4.5). Significantly, GPT-4-turbo scoring higher than GPT-4-1106-preview will not cause this question to resolve positive.

Get Ṁ600 play money

🏅 Top traders

#NameTotal profit
1Ṁ290
2Ṁ124
3Ṁ115
4Ṁ98
5Ṁ49
Sort by:
bought Ṁ5,000 YES

@LoganZoellner this can resolve YES:

@DanMan314 Resolved

@LoganZoellner Does GPT4turbo taking the spot count?

@Shump I'm torn on this, because my original assumption when writing this question was that GPT-4 was "one thing" (elo would not change). But there are now multiple GPT-4's with different elos.

I'm going to go ahead and say "must have a different number", for example 4.5 so gpt-4-turbo still counts as "GPT-4" for the purpose of this question.

If enough people object strongly that they weren't counting on it being ruled this way, I will resolve ambiguously.

Does this resolve positive if any chatbot scores higher at any time until July 1, 2024? Or does it just resolve according to the ranking of the leaderboard at this time?

@TobiasHaeberli I assuming the score for GPT-4 is a fixed value. So if for example GPT4.5 is released with an higher ELO, then it resolves positive.

predicted YES

@LoganZoellner Pretty sure it is not a fixed value.

@ShadowyZephyr That indeed appears to be the case. Therefore, if at any moment in time a model other than GPT-4 takes the #1 spot, this question will resolve "yes". (this could happen because GPT-4 is getting worse).

.