As with my other related questions, by default will judge based on the leaderboard here, based on Elo: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard
If Google deplolys a new model in 2023 that might or might not qualify, but it is not yet ranked on the leaderboard at year's end due to time required for evaluation, I will hold off on resolving until that has happened until a maximum of February 1.
If Google releases a model that the public, or least those who have signed up for its early testing programs, cannot access by the deadline, that does not count - I will use my ability to access it absent any special treatment as a proxy here, or if I get special treatment I will ask others.
As with other questions, I reserve the right to correct what I see as an egregious error in either direction, either by twitter poll or outright fiat, including if the model is effectively available but does not appear on the leaderboard for logistical reasons.
(Same clarification as the related market: If Google does take the top spot or becomes clearly best, this resolves to YES on the spot, this is by EOY not 'at' EOY.)
Related questions

@IsaacKing The resolution rules say he will use the arena ELO, which comes from user votes, not from GPT-4. Using GPT-4 was for the MT-Bench score, which doesn't get mentioned in the rules.
That said, I don't doubt that GPT-4 can do better at grading responses than at producing them, just as humans can.
@DavidBolin I'm actually working on a post on that question because it's important in other ways. For ELO purposes I think it's clearly true, but for the purposes of providing feedback or choosing what is safe to implement, or similar, I think you need to do sufficiently precise evaluation that it is no longer easier...
(Confirmed my intention is to use ELO, and that as I understand it this is human evaluation, not GPT-4.)
Gemini will very likely be better than GPT-4, and I don’t see why Google would prevent it from at least limited public access




















