What is “gpt2-chatbot”?
May 14
gpt-4.5 or 5 or whatever OpenAI's next generation model is called https://x.com/phill__1/status/1784964135920235000
other (e.g. just an updated version of GPT-4)

A new model called "gpt2-chatbot" is being benchmarked on LMSYS ChatBot Arena and has generated a ton of rumors and speculation on Twitter. Some users think this model might be OpenAI's next-generation model, while others believe it could be a fine-tuned version of one of OpenAI's old models, such as gpt-2 from 2019. I will resolve the market to the option most closely resembling the truth.

Get Ṁ200 play money
Sort by:

@traders I did not expect this to happen, but what do you all think about potentially resolving this market as 50% "OpenAI's next generation model" and 50% "other (e.g. just an updated version of gpt-4)"

@chrisjbillington FYI

@Soli or resolve 2 variants to 0, and the other to to 23 and 77 (proportionally to the market value)

@KongoLandwalker this would be an option yes. What would be the reasoning behind doing this?

@Soli No. The description distinguishes "next generation model" from "finetuned version of old model".

gpt-4o is stated to be a new model. So it's not properly an updated version of GPT-4, and is most closely a "next generation model".

@Mira the description was specifically referring to gpt-2 in that case but I definitely see where you are coming from. If I have to choose only one option, then "next generation model" would be more accurate but this doesn't mean that gpt-4o can't also be an "updated version of gpt-4"

I think the problem stems from the fact that most of the new stuff in gpt-4o is audio/vision capabilities and usability. If you only judge the model based on text output, then it is very close to gpt-4.

I didn’t consider this scenario when I created the question, which is why we ended up with two options that can both be true. Does this make sense to you?

@Soli the description has this part though which would be problematic for a 50/50 resolution

I will resolve the market to the option most closely resembling the truth.

@Soli I am biased, but there probably won’t be a GPT-4.5 (see related markets), so this is what the next generation model is called (by OpenAI branding).

But I’m also ok with 50/50 given ambiguity

A good counterpoint is this market refers to gpt2-chatbot which, depending on definition, is functionally GPT-4

@traders i will close this question till i have time to clarify some stuff

bought Ṁ600 other (e.g. just an ... YES

@jackgwhit strictly speaking that tweet is about "im-also-a-good-gpt2-chatbot" and this market is about "gpt2-chatbot", and although the models are presumably related it's not clear in what way.

@chrisjbillington yeah perhaps, but i view the evidence as incredibly strong. curious what else will come out!

@traders there is a strong discrepancy between this market and /Soli/is-this-real-gpt45 20% vs 6%

Interesting that karpathy has been doing so much open source work on gpt2 since leaving openai

@RemNi he is :)

I'd bet GPT-2+Q* if the resolution was more objective and not in a month.

@Bair I made one that goes until end of year. Feel free to suggest improvements to the resolution criteria. https://manifold.markets/jim/is-gpt2chatbot-gpt2

Where do I answer "A Small Language Model"? I think it's just OAI flexing how well they went under gpt2 parameter architecture.

@MP If that's the case it's a breakthrough of two or more orders of magnitude. gpt2-chatbot It performs similarly to models in excess of a hundred billion parameters, while GPT-2 had 1.5 billion

What is the distinction between "GPT-4.5" and "just an updated version of GPT-4", to you?

Something like "just the same model running at a different temperature" as @chrisjbillington suggests below I assume would be "just an updated version of GPT-4". But what about, say, a major fine-tune of GPT-4? What about a distillation of GPT-4?

@MugaSofer if its OpenAI’s next generation model (4.5 or 5) there will be many signs such as a proper press release by OpenAI or a significant bump in score on ChatBot arena.


I did not get a chance to try the model myself but i felt that there was a consensus by people who did that it is better than all existing models on the market - curious to see if this will turn out to be true

@Soli not consensus IMHO. I think we have consensus for "in the same ballpark", and whether it is actually better or not seems like it would require us to wait and see the ELO.

Honestly it seems basically the same as existing GPT4 turbo to me, it gives slightly different answers and they're sometimes better and sometimes worse. It could be just the same model running at a different temperature even.

There was a lot of hype on Twitter early on, which I think was extremely subject to confirmation and selection bias.

@chrisjbillington i agree there could be strong confirmation and selection bias at play here but i did see many examples of people claiming that this model is atleast slightly better than GPT-4

i agree with you though that most probably it is just an improver version of one of the existing models and not the next-generation model we are all waiting for

@Soli wow. Can people please just screen record all the time? We should have hundreds of sessions to rerun and compare!

@chrisjbillington but then why call it gpt2-chatbot?

@RemNi I have no clue, but what do you mean by "but"? What hypothesis does its name support? Seems like a mystery that doesn't point much in any particular direction.

I guess you'd expect it to be called "gpt4-next-test" or something? Maybe, but maybe they also don't want to reveal what it actually is - whether it's an older GPT-4 less restrained by RLHF, or a new one, or a fine-tuned one. Using an obscure name doesn't tell us anything and I have a reasonably high prior they'd do that. Also they love the mystique, they could be giving it a weird name for shits and giggles.

The fact that I don't understand makes me hesitant to read into the name, so I'm inclined to treat it as a red herring.

@chrisjbillington I think they run the risk of losing credibility if they push too hard on the ambiguity/confusion angle. They are no longer running without competition.

@RemNi yeah I definitely agree with that.

@chrisjbillington also credibility is something they are seeking to rebuild since the events of late last year with the board

@chrisjbillington I agree with the mystique angle, but I don't think they're that company anymore