Which company has best AI model end of August? (Chatbot Arena Leaderboard)
325
1kṀ250k
Aug 31
95%
Google
2%
OpenAI
1.9%
xAI
Get
Ṁ1,000
to start trading!
Sort by:

@ng Google leads with style control removed

Why is nobody betting google to 99%

why chat at 2% when ahead of gemini

@realDonaldTrump it's explained by the pinned message. if you go on the top right of the text leaderboard and select "without style control", you'll get the correct ordering for the purpose of this market

bought Ṁ50 YES

This might change this market

bought Ṁ200 YES

@JoaoPedroSantos gpt5-high is now shown and it's still below Gemini 2.5 Pro.

@BayesianOracle they just renamed gpt-5 to gpt-5-high for transparency

@Bayesian gotcha, ty

One interesting thing is head-to-head (with style control), GPT-5 losses to Gemini 2.5 about ~66% of the time, which is significant (p<0.05). GPT-5 beats out some other models at a bit higher rate, but not by much. For example, if we look at the rate GPT-5 beats Claude-Sonnet-4-thinking (0.74 with 47 samples) and the rate rate Gemini2.5 beats than Claude-Sonnet-4-thiniing (0.68 with 330 samples), we can note GPT-5 rate is not significantly greater than the Gemini2.5 rate (Fischer exact test p ~= 0.24).

The 21 point ELO with lead style control seems tenuous, and then are tied in ELO without style control. With more data, Gemini could take the lead there.


(though also just noticed this data is 4 days out of date. They may have made some changes right after release which changes the dynamics)

updates happen every week or so, and gemini 2.5 pro is leading without style control but yeah this is a curious stat (that gemini crushes head-to-head)

About the bit about resolving proportionally in case of a tie, is that for a tie in rankings? e.g., like how right now Google and OpenAI are both at rank 1 without style control (unless I'm misreading).

@sblaplace No, you can have the same ranking but different arena score, and ties refer to arena score ties only

sold Ṁ27 NO

@Bayesian got it, so that's only in case of an exact ELO tie, makes sense ^~^ thanks

filled aṀ250 YES at 24% order

@AffineTyped wanna bet more...

opened a Ṁ500 YES at 20% order

@AffineTyped oh I didn't see you're turning off style control. Lame

rip, mb, it was previously something like "default settings (with style control off)" bc it was a port from previous months when that was the default

@Bayesian yah it's my reading failure, and for some reason I thought we had all migrated to just whatever the leaderboard says at the end of the month

@AffineTyped regardless of their defaulta

Yeah i’m kind of hoping polymarket does this for next year and im planning to do it for next year but yeah arbness is a nice property

Hello

bought Ṁ223 YES

Why is this market so down on OpenAI considering their in the lead

@ng Google leads with style control removed

© Manifold Markets, Inc.TermsPrivacy