[This is Casey's medium-confidence prediction from the 12/22/23 episode of Hard Fork]
This market will resolve to yes if Google's Gemini Ultra or other state-of-the-art LLM is roughly equivalent to OpenAI's best publicly available LLM on December 31, 2024, and if Google's AI products have cut into ChatGPT's share of the consumer LLM chatbot market. Otherwise, it will resolve to no.
Gemini market share is falling, this is a sure no
https://firstpagesage.com/reports/top-generative-ai-chatbots/
@JasonDavies Hard to tell about how to compare given Google doesn't have something like o1?
I mean, obviously that's not the top of the raw leaderboard here, but it does seem to be a potential significant lead for OpenAI in an approach the might lead to future scaling?
Does this resolution go by Hard Fork's own decision?
If instead judged by @KevinRoose18ac, is this strictly about chatbots, or LLM's in general? For example, you could see Gemini being ~natively present in lots of Android phones, putting them at roughly equal footing with OpenAI's iOS play. But when it comes to chatbots, as in visiting chatGPT.com, I think OpenAI wins by a landslide.
At this point, OpenAI's best model on the LMSYS leaderboard beats Google's best model 50.66% of the time; pretty close to a coinflip. Along with Gemini traffic at ~25% of ChatGPT's traffic, it seems like if this resolved today, it would resolve YES.
Obviously, the market is actually about the state of things on December 31st, so that's not dispositive, but I wonder if 40% is the right place for this to be sitting right now?
@ChrisPrichard OpenAI has not released a major model (GPT-4.5 or GPT-5) in 2024, and they say that they will. Google has caught up to GPT-4, but can they beat whatever the new one is? My guess would be no.
https://twitter.com/natfriedman/status/1777739863678386268?t=Acv_z3u7bB2q6F0kNvozrQ&s=19
Surprisingly larger traffic for Gemini than I'd have guessed, though I'm not sure what the bar for "cut into ChatGPT's share" is.
@KevinRoose18ac Can you clarify whether this market would resolve YES if it ended today? Currently users prefer Gemini Pro in the LMSYS arena about 43% of the time over the best GPT, so users still clearly prefer GPT, but not by so much that an individual user could easily tell which LLM they were talking to. Does this count as being "roughly equivalent"?
Additionally, by "cut into ChatGPT's share of the consumer LLM chatbot market" do you include integrated chatbot features such as email generation and spreadsheet formulation, or are you only counting direct Q&A-style chats?