Will Google have the best LLM by EOY 2023?
closes Jan 1

As with my other related questions, by default will judge based on the leaderboard here, based on Elo: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard

If Google deplolys a new model in 2023 that might or might not qualify, but it is not yet ranked on the leaderboard at year's end due to time required for evaluation, I will hold off on resolving until that has happened until a maximum of February 1.

If Google releases a model that the public, or least those who have signed up for its early testing programs, cannot access by the deadline, that does not count - I will use my ability to access it absent any special treatment as a proxy here, or if I get special treatment I will ask others.

As with other questions, I reserve the right to correct what I see as an egregious error in either direction, either by twitter poll or outright fiat, including if the model is effectively available but does not appear on the leaderboard for logistical reasons.

(Same clarification as the related market: If Google does take the top spot or becomes clearly best, this resolves to YES on the spot, this is by EOY not 'at' EOY.)

Get Ṁ500 play money

Related questions

By the end of 2023, will any LLM correctly answer the question "How many times does the Earth rotate in a year?"?
IsaacKing avatarIsaac
98% chance
What will the next generation of LLM from Meta be named?
Will google home assistant have LLM integration by Oct 1, 2023?
DylanSlagh avatarDylan Slagh
5% chance
Will the new LLM released by Meta be open-source?
kr avatarKrit
43% chance
Within the next 5 years, there will be meaningful number of people (>10,000) who treat an LLM as a religious authority figure.
JamesDillard avatarJames Dillard
77% chance
In 2028, will Gary Marcus still be able to get LLMs to make egregious errors?
ScottAlexander avatarScott Alexander
35% chance
Will Google have the best LLM by EOY 2024?
ZviMowshowitz avatarZvi Mowshowitz
71% chance
Will we find out in 2023 about a nation state using LLMs for generating propaganda messages? (4500M subsidy)
firstuserhere avatarfirstuserhere
46% chance
Will Google's Gemini LLM be released in 2023?
RH avatarRH
76% chance
will someone discover areas of existing ANSI in LLMs until 2024?
PitSchultz avatarPit Schultz
51% chance
Will Google have a better LLM than OpenAI by 2025?
JoeReeve avatarJoe Reeve
46% chance
Will there be a criminal prosecution of an individual in the US for LLM prompt engineering before 2025?
A7om20 avatarA7om.eth
28% chance
Will a LLM considerably more powerful than GPT-4 come out in 2023?
BionicD0LPH1N avatarBionic
34% chance
Will any LLM released by EOY 2025 be dangerously ASL-3 as defined by Anthropic?
ZviMowshowitz avatarZvi Mowshowitz
30% chance
Will Apple embed an LLM in any of their products by the end of 2023?
Will an LLM have been reported to earn or gain cryptocurrency by EOY 2023?
PeterWildeford avatarPeter Wildeford
52% chance
Will at least 1 in 8 teams at a FAANG company routinely deploy LLM-written production code by the end of 2023?
ML avatarML
35% chance
Will an LLM be able to solve confusing but elementary geometric reasoning problems in 2024?
dreev avatarDaniel Reeves
61% chance
Will DPO or an Explicitly DPO-based Technique be Used to Train a Public Frontier Lab LLM Before Jan 1 2025?
marcer avatarmarc/er
51% chance
Will any LLM released by EOY 2024 be dangerously ASL-3 as defined by Anthropic?
ZviMowshowitz avatarZvi Mowshowitz
15% chance
Sort by:
IsaacKing avatar

Wait, it uses GPT-4 to grade the responses? How is that fair?

3 replies
JamesBromley avatar
James Bromleypredicts NO

@IsaacKing it isnt. Its exactly why i voted no

DavidBolin avatar
David Bolinpredicts NO

@IsaacKing The resolution rules say he will use the arena ELO, which comes from user votes, not from GPT-4. Using GPT-4 was for the MT-Bench score, which doesn't get mentioned in the rules.

That said, I don't doubt that GPT-4 can do better at grading responses than at producing them, just as humans can.

ZviMowshowitz avatar
Zvi Mowshowitz

@DavidBolin I'm actually working on a post on that question because it's important in other ways. For ELO purposes I think it's clearly true, but for the purposes of providing feedback or choosing what is safe to implement, or similar, I think you need to do sufficiently precise evaluation that it is no longer easier...

(Confirmed my intention is to use ELO, and that as I understand it this is human evaluation, not GPT-4.)

AdamK avatar
AdamKbought Ṁ20 of YES

Gemini will very likely be better than GPT-4, and I don’t see why Google would prevent it from at least limited public access

1 reply
DavidBolin avatar
David Bolinpredicts NO

@AdamK Only 2.5 months left.