Which company has best AI model end of August? (Chatbot Arena Leaderboard)
269
1kṀ190k
Aug 31
85%
Google
7%
OpenAI
3%
xAI
3%
Meta
1%
DeepSeek
1%
Anthropic
Get
Ṁ1,000
to start trading!
Sort by:

@ng Google leads with style control removed

One interesting thing is head-to-head (with style control), GPT-5 losses to Gemini 2.5 about ~66% of the time, which is significant (p<0.05). GPT-5 beats out some other models at a bit higher rate, but not by much. For example, if we look at the rate GPT-5 beats Claude-Sonnet-4-thinking (0.74 with 47 samples) and the rate rate Gemini2.5 beats than Claude-Sonnet-4-thiniing (0.68 with 330 samples), we can note GPT-5 rate is not significantly greater than the Gemini2.5 rate (Fischer exact test p ~= 0.24).

The 21 point ELO with lead style control seems tenuous, and then are tied in ELO without style control. With more data, Gemini could take the lead there.


(though also just noticed this data is 4 days out of date. They may have made some changes right after release which changes the dynamics)

updates happen every week or so, and gemini 2.5 pro is leading without style control but yeah this is a curious stat (that gemini crushes head-to-head)

About the bit about resolving proportionally in case of a tie, is that for a tie in rankings? e.g., like how right now Google and OpenAI are both at rank 1 without style control (unless I'm misreading).

@sblaplace No, you can have the same ranking but different arena score, and ties refer to arena score ties only

sold Ṁ27 NO

@Bayesian got it, so that's only in case of an exact ELO tie, makes sense ^~^ thanks

filled aṀ250 YES at 24% order

@AffineTyped wanna bet more...

opened a Ṁ500 YES at 20% order

@AffineTyped oh I didn't see you're turning off style control. Lame

rip, mb, it was previously something like "default settings (with style control off)" bc it was a port from previous months when that was the default

@Bayesian yah it's my reading failure, and for some reason I thought we had all migrated to just whatever the leaderboard says at the end of the month

@AffineTyped regardless of their defaulta

Yeah i’m kind of hoping polymarket does this for next year and im planning to do it for next year but yeah arbness is a nice property

Hello

bought Ṁ223 YES

Why is this market so down on OpenAI considering their in the lead

@ng Google leads with style control removed

bought Ṁ250 NO

@Bayesian "without filters" and "without style control" are kind of contradictory, given the default is with style control?

@bens dang i'll make sure to update all the markets to make this clearer. it's meant to track the polymarket market and be without style control, but hm

it may make sense to N/A this, that is a pretty unfortunate development

@Bayesian I mean, I kind of assumed the "without style control" superceded, and I think you can leave it open? but idk

@bens ok, updated and reopened

@Bayesian Is this N/A?

@Trazyn no, will resolve EOM based on lmarena.ai without style control

opened a Ṁ10,000 YES at 54% order

jim order up for OpenAI topping the Chatbot Arena Leaderboard this month

@jacksonpolack , @Bayesian , @SemioticRivalry , @Velaris , @khang2009 , @Gen , @skibidist , @evan , @brod , @100Anonymous , @Ziddletwix ,, @Trazyn , @bagelfan , @geuber , @nikki , @ProjectVictory , @sahaj , @bohaska , @Odoacre

jim orders are large limit orders, generally at better than market prices.

Opt in / opt out thread: https://manifold.markets/post/jim-order-notification-optin-thread

© Manifold Markets, Inc.TermsPrivacy