Is Gemini 3.0 basically state of the art at everything?
111
Dec 3
Yes
No
See results

Get
Ṁ1,000
to start trading!
Sort by:

It is failing at my personal benchmark compared to GPT 5.1

Equal/better/significantly better at most things (aka SOTA), but clearly worse at hallucinations (not minor/“pedantry” imo), possibly marginally worse than Opus 4.5 and Codex 5.1 at coding

Google looks potentially in the lead, but for most normal use it's not necessarily a good reason to switch from whatever you normally use such as ChatGPT.

I'd love to hear anyone with a concrete example where Gemini 3 is significantly worse than another general system. (eg not AlphaFold or Suno or w/e)

@MaxHarms What are your thoughts on its apparent poor performance when it comes to hallucinations?

https://manifold.markets/Bayesian/will-gemini-30-be-basically-sota-at#vimhv8artv

@Nat Seems like a good answer to a concrete way it's not SotA!

I haven't personally noticed the hallucinations (and have been in contexts where I can check/notice), but I buy that they're an issue for many use cases.

SotA = about as good as other stuff, or does it mean clearly better than everything else?

@MaxHarms at poll voter's chosen level of pedantry / according to their chosen reading of the poll

@MaxHarms I took this to mean that no other model is significantly better at anything.

Comment hidden
© Manifold Markets, Inc.TermsPrivacy