Context:
https://twitter.com/kosenjuu/status/1784952955294421215
Resolves YES if "gpt2-chatbot" is confirmed to be GPT-4.5/GPT-5/OpenAI's next big model.
an incrementally-improved gpt-4 model would not count
Model is available on https://chat.lmsys.org/
(click onto the "Direct Chat" tab, then select "gpt2-chatbot" as the model).
@Cosmic1 https://www.axios.com/2024/05/13/openai-google-chatgpt-ai
Agreed. The mm shared this source below and is taking it as gospel, which is doing a LOT of heavy lifting to support both that 4o is not a "successor" and that gpt2-chatbot is synonymous with "im-also-a-good-gpt2-chatbot." Both these claims appear to be editorialized in ways that are inaccurate.
@Cosmic1 For what it's worth, I have a trick question I ask all the LLMs, and the only ones to recognize the trick are gpt-4o
and im-also-a-good-gpt2-chatbot
.im-a-good-gpt2-chatbot
got it wrong 4 out of 5 times, so I don't think the latter is GPT-4o. I'm not sure about gpt2-chatbot
.
@uwu Why would gpt2-chatbot
be synonymous with im-also-a-good-gpt2-chatbot
? That one makes little sense.
@uwu I am not taking the article as gospel. Especially not the part about whether it's a successor, which isn't even presented as being something Mira said. The bit about "GPT-4 level intelligence" is the same as in various places in official OpenAI communication, so I take that seriously. The bit about "A more major update to the underlying model" I take seriously but with a pinch of salt.
@Cosmic1 can you explain what your point is? Do you think there's some chance that gpt2-chatbot, despite doing worse on leaderboards etc., actually is GPT-4.5 or GPT-5?
@jim Even GPT-4o is not GPT-4's successor; it's about the same:
As measured on traditional benchmarks, GPT-4o achieves GPT-4 Turbo-level performance on text, reasoning, and coding intelligence
@uwu things aren't always as they seem. I'm happy to issue everyone refunds if someone points out some crucial thing i am missing or some error in reasoning i have made
i think this was supposed to be resolved as yes since they used gpt2 as the nickname for benchmarking the new model. ping me if u want me to find the source @jim
"An incrementally-improved gpt-4 model would not count."
Doesn't imply that all things which are not just incrementally-improved GPT-4 models do count.
This resolved NO because GPT-4o is not GPT-4's successor. It's roughly equal in intelligence and is branded as GPT-4, and seems an extremely similar model, quite like turbo was.
https://platform.openai.com/docs/models
OpenAI regards GPT-4o as their flagship model and GPT-4 Turbo as "previous".
A +50 ELO increase is not "roughly equal in intelligence." It is not a next-level frontier model like GPT-5 will be, but everything points to GPT-4o as a GPT-4 successor in the same way a "GPT-4.5" would have been. Other markets also regard GPT-4o as equivalent to GPT-4.5.
GPT-4o was trained from scratch to be end-to-end. It is not a jump like GPT-4 to GPT-4 Turbo but more like a GPT-4.5 and definitely a GPT-4 successor (not an iterative improvement).
new flagship model
I agree it's their new flagship GPT-4 model (or GPT-4 level model, depending on your interpretation). But this market isn't meant to resolve YES on a GPT-4 model, nor a GPT-4 level model.
not roughly equal in intelligence
I think it's roughly equal. OpenAI thinks its roughly equal.
Other markets also regard GPT-4o as equivalent to GPT-4.5.
Some based on faulty reasoning, some on correct reasoning from very different resolution criteria. My traders are lucky I thought through everything carefully and reached a solid conclusion, rather than outsourcing the resolution to less careful people.
for the people who want a second coming, I'll review all best arguments of evidence, including the poll closing next sunday. https://manifold.markets/StephenMWalkerII/is-gpt2chatbot-gpt4s-successor