As per the LMSYS Chatbot Arena leaderboard, the latest iteration of GPT-4 currently has a 77% (0.77) win rate against Mistral Medium, approximately representing the advantage of GPT-4 over GPT 3.5. As of 2025-01-01, will there be a model that has a 75% or higher win rate against the latest iteration of GPT-4?
Clarifications:
I will look at the ranking of the models in the
Fraction of Model A Wins for All Non-tied A vs. B Battles
section of Chatbot Arena, or an equivalent section as of Jan 1st, 2025. If a new GPT-4 model is released on (say) Dec 31st, 2024 and is not yet ranked on Chatbot Arena, it will not count for the purposes of this question.Any model that's named
gpt-4-*
will count. Sogpt-4-turbo-2025-01-01
orgpt-4-hyper-advanced
will count as "GPT-4". Something likegpt-4.1-turbo
orgpt-5-turbo
will not count as "GPT-4".If ChatBot arena no longer provides a win % for any GPT-4 models or ceases to exist entirely, this question will resolve as N/A.
If the ChatBot Arena website happens to be down for maintenance or any technical issue on Jan 1st, 2025, I will keep trying again for 7 days. If after 7 days the ranking is still unavailable, I will resolve this to N/A.
@SergeyDavidoff possible but 'o1' (non-preview) is supposed to come out by EOY, plus I assume o1-preview is actively being tweaked right now based off user feedback.
Chat Arena is a flawed benchmark anyway, and based on the numbers it seems almost impossible this would happen. GPT-4o, the number 1 model on the leaderboard, only has a 71% win rate against its most lopsided matchup. That matchup is GLM-4, a model which is terrible in all my evaluations.
Even if the next gen of models are a huge step up over the current SOTA, I don't expect any of them to achieve that high of a win rate
Very small sample (<30 heads up matches) but o1-preview has now done it :-) The market will resolve on January 1st and things might change with a bigger sample size but l'm becoming pretty bullish on this resolving to Yes.
GPT-4o did not follow the gpt-4-x scheme, so it doesn't count as GPT-4 for the purposes of this question (it would've if it was called gpt-4*-*o). So if gpt-4o beats gpt-4-latest with a 75+% win rate in the leaderboard, this will resolve to Yes.
why is this market less than the gpt-5 comes out in 2024 market? The 'gpt-5 will be underwhelming' money?
https://manifold.markets/VictorLJZ/will-gpt5-be-released-before-2025