Gemini Ultra will achieve a higher rating than an OpenAI's GPT-4 model on Chatbot Arena Leaderboard before May 1st 2024
48
949Ṁ17k
resolved May 10
Resolved
N/A
A version of Gemini Pro `Bard (Gemini Pro)` surpassed all models except GPT-4 Turbo. Context: https://twitter.com/lmsysorg/status/1750921228012122526
+45%
on
Announcement of "Gemini 1.5" model with up to a 1M context: https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/

Access to Gemini Ultra hasn't been publicly released as of December 16th 2023 yet but according to Google's technical report from their blog post claims to beat out GPT-4 on various benchmarks.


It's a challenge to evaluate LLM-based chat assistants directly and their multiple methods, but one way was developed to use human preferences in a "Chatbot Arena" as presented in this paper & blog post by the Large Model Systems Organization (LMSYS) team. There is a "Chatbot Arena Leaderboard" on HuggingFace from this team with this idea that uses human preferences to create an Elo rating to rank the different LLM-based chatbots.


As of December 16th, GPT-4 models sit at the top three spots with Gemini Pro sitting below GPT-3.5-Turbo-0613 but slightly above GPT-3.5-Turbo-0314.

Fraction of Model A Wins for All Non-tied A vs. B Battles (2023-12-16)
Highest to lowest ranking of models goes from top to bottom

Bootstrap of MLE Elo Estimates (1000 Rounds of Random Sampling) (2023-12-16)


This resolves to "YES" if some version of Google's Gemini Ultra (if there are multiple versions) has a higher Elo rating than any OpenAI GPT-4 model on the public leaderboard at any point by April 30th 2024 (23:59 PDT). Note, this is only comparing any GPT-4 model and not necessarily the highest rank or most recent model.


This resolves "NO" if Google's Gemini Ultra appears on the leaderboard but does not ever score a higher Elo rating than an OpenAI GPT-4 model by April 30th 2024 (23:59 PDT).


This will resolve "N/A" if any of the following occurs:

  • The "Chatbot Arena Leaderboard" on HuggingFace (via this link: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard) is no longer publicly accessible before the April 30th 2024 deadline.

  • Google's Gemini Ultra never appears on the leaderboard before the April 30th 2024 deadline.

  • There are no longer any OpenAI GPT-4 models on the leaderboard before the April 30th deadline and Gemini Ultra has not received a rating (while the GPT-4 models were still on the board).


There might be other edge cases that I might not have thought of but hopefully that covers any that are likely to occur. I may edit/clarify the description if there are new developments or suggestions by others.


The spirit of this is to have a public evaluation metric to compare Google's Gemini Ultra model versus one of the most capable models there is today (OpenAI's GPT-4 model).


Related Markets

The market below proposes the same question as the one listed above but resolves on after April 30th 2024 instead of the end of 2024 and only includes Gemini Ultra (not future Gemini models).

Note this was also created by me


Similar to this market but asks about any model (from OpenAI, Google, etc.) that specifically beats the current (2023-12-20) version of GPT-4-Turbo:

Note this was also created by me

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy