Any Gemini LLM will achieve a higher rating than all OpenAI's GPT-4 models on Chatbot Arena Leaderboard by Jan 1st 2025?

Gemini is a family of multimodal large language models developed by Google DeepMind, serving as the successor to LaMDA and PaLM 2. Comprising Gemini Ultra, Gemini Pro, and Gemini Nano, it was announced on December 6, 2023, positioned as a competitor to OpenAI's GPT-4. It powers the chatbot of the same name.

According to Google's technical report from their blog post claims to beat out GPT-4 on various benchmarks.

It's a challenge to evaluate LLM-based chat assistants directly and their multiple methods, but one way was developed to use human preferences in a "Chatbot Arena" as presented in this paper & blog post by the Large Model Systems Organization (LMSYS) team. There is a "Chatbot Arena Leaderboard" on HuggingFace from this team with this idea that uses human preferences to create an Elo rating to rank the different LLM-based chatbots.

As of March 15th 2024, GPT-4 models sit at the top 2 spots with Gemini 1.5 Pro 3rd.

Fraction of Model A Wins for All Non-tied A vs. B Battles (2024-05-15)
Highest to lowest ranking of models goes from top to bottom

Bootstrap of MLE Elo Estimates (1000 Rounds of Random Sampling) (2024-05-15)

This resolves to "YES" if some version of Google's Gemini LLM (if there are multiple versions) has a higher Elo rating than ALL OpenAI GPT-4 models on the public leaderboard at any point by January 1st 2025 (Market Close). Note, this is only comparing ALL GPT-4 model(s) and not necessarily the highest ranking or most recent model(s).

This resolves "NO" if some version of Google's Gemini LLM (if there are multiple versions) appears on the leaderboard but does not ever score a higher Elo rating than ALL OpenAI GPT-4 model by January 1st 2025 (Market Close).

This will resolve "50%" if any of the following occurs:

The "Chatbot Arena Leaderboard" on HuggingFace (via this link: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard) is no longer publicly accessible before the January 1st 2025 (Market Close) deadline.
Google's Gemini Models never appears on the leaderboard before the January 1st 2025 (Market Close) deadline.
There are no longer any OpenAI GPT-4 models on the leaderboard before the January 1st 2025 (Market Close) deadline.

There might be other edge cases that I might not have thought of but hopefully that covers any that are likely to occur. I may edit/clarify the description if there are new developments or suggestions by others.

Note this is a continuation with wording adjustments of a similar market that was N/A'd

DISCLAIMER
I DO NOT PARTICIPATE IN MARKETS I CREATE
DO NOT TRADE OFF OF UNCONFIRMED MARKET NEWS OR NEWS YOU MAY NOT UNDERSTAND. I AM NOT RESPONSIBLE FOR MISUNDERSTANDING IF YOU DO NOT ASK FOR CLARIFICATION FIRST.
If Any Clarification Is Needed, I May Temporarily Close The Market To Make Clarifying Statements & Than Re-Open ; Feel Free To Ask For Clarification Through Messages Rather Than Making A Comment. Comments are not a clarification unless posted into the description.

People are also trading

People are also trading

Related questions