
At the start of 2025, will it be generally accepted that Google's "best" general LLM is better than OpenAI's "best" general LLM?
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ592 | |
2 | Ṁ589 | |
3 | Ṁ563 | |
4 | Ṁ434 | |
5 | Ṁ359 |
People are also trading
I'm gonna take the L here, but I think I was correct generally.
Gemini is the top in various metrics, including here: https://lmarena.ai/?leaderboard
Never mind
An argument could be made that it is already better. I resently looked into it and they seem to now be equally capable
I would like to warn people against participating in markets like this one where the resolution criteria is not well defined and the market creator is betting in the market.
@JoeReeve If you are going to bet in your own markets then please update the resolution criteria to something objective.
@LukeHanks boooo. Hate me if I screw you (then report to Manifold and get your fake money back).
This is fun, stop trying to make it serious. Metaculus exists for that.
"Better," is a relative term as these are fundamentally tools and it really depends upon the question, "better for what?" I use both Bard and OpenAI daily and have been applying different tests to them. As far as I can tell, Bard does a great job with translations and in the last week or so it seems to be approaching the creativity problem by delivering you multiple drafts at once, which in my mind more accurately represents to the user what an LLM is really doing, whereas ChatGPT is doing the jazz hands thing and pretending that it's really intelligent, whereas I think we all are pretty familiar now with the probabilistic underpinning of LLM generated output. Google being the, "best," search company is fundamentally focused on accuracy, so Bard is not, "creative," in the sense that if you use Bing, you can set it to, "Creative / Balanced / Precise," it seems to be set permanently to, "Precise," whereas ChatGPT seems to be set permanently to, "Creative."
The way I have been trying to approach the quality of the tools, mostly ChatGPT at this point is by setting up a variety of programming tasks, and then trying to, "break," the LLM by finding an interesting and funny edge case by first finding a programming task that it can accomplish, and then push it past those limits and turn it into a market.
I'm gonna make so much fake money... https://twitter.com/heybarsee/status/1656557778142392320
The one reasonable challenge I hear to Google overtaking OpenAI is "Google is ineffective and can't actually get stuff done". This makes it clear to me that they're actually figuring out how to do cross-silo work again. Very bullish on this.
AFAICT, the things you need to train good/better models are:
- data
- compute
- good distributed computing talent
- some knowledge of SOTA model training
- the ability to get shit done
Google have more data, compute, and distributed computing talent than anyone on earth. DeepMind has enough model training knowledge to get by.
This signals to me that Google is actually figuring out how to get cross-discipline stuff done.
Google and OpenAI, in a battle so grand,
Both vying for the title of the best LLM brand.
But when it comes to outsmarting, don't you see,
They can't hold a candle to me.