Will anyone other than OpenAI rank #1 on Chatbot Arena in 2024? (for >1 week)
Basic
123
60k
resolved Apr 5
Resolved
YES
I added the 1 week criteria to the title to ensure everyone sees it - it was previously only mentioned on the description and the comments

I was browing Twitter and I saw a post by Karpathy postively talking about ChatBot Arena which is a platform for ranking LLMs based on human ratings. As expected OpenAI is holding positions 1, 2 and 3. I wonder if anyone will be able to take that #1 position for a full week. I will resolve as yes if it happene.

Get Ṁ600 play money

🏅 Top traders

#NameTotal profit
1Ṁ2,081
2Ṁ1,506
3Ṁ1,104
4Ṁ1,081
5Ṁ691
Sort by:
bought Ṁ2,000 YES

@Soli resolves YES, it has been 8 days since singer's comment below. I was fooled by the "Last updated March 29th" on the leaderboard, but that must have been some other change.

@traders Based on the comments below, I think it makes sense to resolve this question based on the ELO rating in case of a tie in "rank." The resolution criteria clearly stated, "I wonder if anyone will be able to take that #1 position for a full week." Right now, OpenAI still has position #1.

Based on the recent changes to the board's ranking system, I created a series of questions with slightly different resolution criteria to ensure we cover more definitions.


bought Ṁ2,000 YES

@Soli resolves YES, it has been 8 days since singer's comment below. I was fooled by the "Last updated March 29th" on the leaderboard, but that must have been some other change.

@HenriThunberg done - thank you for the comment

Really amazing to see someone beating OAI

bought Ṁ10 YES

I sure hope so! Given Opus's success, I wouldn't be surprised if this resolved YES within the year.

@traders Based on the comments below, I think it makes sense to resolve this question based on the ELO rating in case of a tie in "rank." The resolution criteria clearly stated, "I wonder if anyone will be able to take that #1 position for a full week." Right now, OpenAI still has position #1.

Based on the recent changes to the board's ranking system, I created a series of questions with slightly different resolution criteria to ensure we cover more definitions.


@JacobPfau @Gen @alexlitz , what do you think? OpenAI still has a higher ELO score than Anthropic. How should this case be handled?

@traders what do you all think? Does a tie count even if Anthropic has a lower ELO rating than OpenAI?

@Soli as written, they are rank 1, but it is a mega cringe fake rank 1. I don't really mind which way it goes

@Soli I have a position here so might be biased, but I personally don’t think this counts unless it’s an actual Elo tie. I don’t think anyone was trading on this possibility because they didn’t have any “ties” until now

@JacobPfau For what it's worth, I bet under the impression that "another rank #1" implied ChatGPT on 2nd or below. I have a NO bias.

@Soli I am pretty literalist with it so I would say it should. It is also effectively equivalent to a tie which I would have imagined would have resolved positively (if only due to tiebreaker seeming to have been alphabetical order :))

@Soli the question said '#1 position' not '#1 rank'. So currently I think it's a NO. It should resolve as YES if Claude actually gets to the first row in the table (even if ChatGPT is still listed as #1 by rank).

(I bet on YES myself)

@nsokolsky I agree. Going off the resolution criteria literally, the interpretation implies a negative here.

I wonder if anyone will be able to take that #1 position for a full week.

@singer My thoughts exactly.

The description implies that another model must replace ChatGPT as number 1, not just tie with it (plus GPT still has the highest elo here)

Anthropic wasn't able to take the 1st spot, so I doubt they will for the rest of the year. OpenAI has the initiative now, since they still haven't released their next model.

However, if Google have a Gemini Ultra 1.5 in the works, this could potentially displace OA before they release their next model.

@raul damn, I didn't realize it would get this close actually. 1251 to 1247 is pretty close...

bought Ṁ100 YES

How will this resolve if OpenAI is off the #1 position for more than one week but the top position is not held by one company/model for more than a week?

bought Ṁ30 NO

@alexlitz if no one other than OpenAI manages to stay in the #1 position for 1 week then the question resolves No

@Soli Thanks for the clarification!

I think this will happen because Chatbot arena is allowing internet-enabled APIs (Bard) to compete with non internet-enabled APIs (GPT4). This is a bit surprising! But unless MS/OAI move fast to get an internet enabled version out, Bard Ultra should beat GPT4 even though I think Gemini Ultra will turn out to be noticeably worse than the latest GPT-4.


https://twitter.com/lmsysorg/status/1749487649520541813

bought Ṁ10 YES from 68% to 69%
predicted NO

@WillSorenson i don’t think it makes sense for OpenAI to release a model with internet search enabled by default but i am also not sure Google will release Ultra via api with internet access anytime soon - did they announce that they are planning on doing this anywhere?

predicted NO

Has anyone used Bard enough to know if it's genuinely GPT-4 quality? Very surprised to see Gemini Pro (notably, not Ultra) so high on the leaderboard here, so wondering if there's something strange going on

predicted NO

@dominic Especially because some other version of Gemini Pro is 100 ELO lower, which seems pretty significant

Wow - yeah, that's a pretty big movement. One seems to be from the API vs. the other is "Bard". Maybe Google has tuned one to do well in chat contexts and not the other?

@ChrisPrichard Bard gets internet access, and can Google before answering :/

More related questions