[SOFT CRITERIA, READ DESC] Will all "gpt2-chatbot" models in LMSYS prove to be new, improved models from OpenAI?
30
311
452
resolved May 16
Resolved
YES
Sam Altam tweet about "im-a-good-gpt2-chatbot"
Models im-a-good-gpt2-chatbot and im-also-a-good-gpt2-chatbot are now in LMSYS (battle mode)
OpenAI announces a live stream on May 13 to demo some ChatGPT and GPT-4 updates.
OpenAI live stream

IMPORTANT: Read full criteria. This market has a soft criteria. For a more strict criteria you have this market: [HARD CRITERIA, READ DESC] Will all "gpt2-chatbot" models in LMSYS prove to be new, improved models from OpenAI?

BE AWARE: For the pourpose of this market "gpt2-chatbot" means all models that induces belief that are based on or from the gpt2-chatbot original one (like having the string "gpt2-chatbot" in their names). This might include any statement from Sam Altman, OpenAI, or other reliable sources. If Sam Altman or OpenAI explicitly states that a certain model is not "gpt2-chatbot" or is a much improved version (like a gpt2-2-chatbot that is more akin to GPT-5 instead of the GPT-4/4.5 level of current gpt2-chatbot models), I will regard that model as not a "gpt2-chatbot" and not consider it for this market.

YES and NO criteria apply to all "gpt2-chatbot" models at the same time as if all were the same model. Thus, for N number of models regarded as "gpt2-chatbot", the resolution criteria will require all these N models to comply in order to resolve either YES or NO (or just resolve as NO past the deadline). That means all need to rank in the top 10, be confirmed by OpenAI, and have a higher ELO than gpt-4-turbo-2024-04-09 for a YES resolution. All need to be denied as an OpenAI model/claimed with evidence by another organization for a NO resolution.

(UTC) April 28: The original gpt2-chatbot, introduced just days earlier, is noticed by the community and gains attention.

(UTC) May 1 Update: gpt2-chatbot was removed from LMSYS.

(UTC) May 7 Update: There are two new "gpt2-chatbot" models in LMSYS (battle mode): im-a-good-gpt2-chatbot and im-also-a-good-gpt2-chatbot.

(UTC) May 16 Update: gpt-4o-2024-05-13 is now in the leaderboard with more ELO than gpt-4-turbo-2024-04-09. gpt-4o-2024-05-13 is confirmed to be gpt2-chatbot.

Current models regarded as "gpt2-chatbot" by this market: gpt-4o-2024-05-13

"gpt2-chatbot" models are now available at https://chat.lmsys.org and are reportedly at a SOTA quality level. There is speculation that it might be a shadow drop of a new OpenAI models to test their performance prior to release.

More info from 4chan: https://rentry.org/GPT2

Resolves as YES:

  • If gpt2-chatbot is confirmed by OpenAI as a new model that improves upon GPT-4 or another version such as GPT-4.5/5 or similar. ✔️

  • If gpt2-chatbot is a finetuned version of an older GPT-4 model or even an earlier model by OpenAI, provided it is better than the last version of GPT-4 (achieving a higher ELO in the overall category of the Chatbot arena leaderboard than gpt-4-turbo-2024-04-09). ✔️

  • Confirmation from OpenAI means either they have explicitly stated it, or they have announced a new model that has been proven to be gpt2-chatbot or a later iteration. ✔️

  • It counts even if gpt2-chatbot is renamed or removed from the Chatbot arena and reintroduced officially. ✔️

Resolves as NO:

  • If September 2024 ends without meeting the YES criteria.

  • If OpenAI denies that gpt2-chatbot is an OpenAI model.

  • If https://chat.lmsys.org states that it is not a model from OpenAI.

  • If another person or organization claims (with evidence) that gpt2-chatbot is from them.

OP Trading: Given the objective nature of this market’s resolution, I reserve the right to place bets. However, I will do so only after at least 5 trades or trade orders from different traders have been made, to avoid any unfair advantage.

RESOLVED as YES (May 16, 2024):

Get Ṁ600 play money

🏅 Top traders

#NameTotal profit
1Ṁ73
2Ṁ60
3Ṁ42
4Ṁ40
5Ṁ25
Sort by:

gpt-4o-2024-05-13 is at the top of leaderboard with more ELO than gpt-4-turbo-2024-04-09. Is confirmed that GPT-4o is gpt2-chatbot. I will update the description of market with the details later, but the resolution is YES.

I will wait for https://chat.lmsys.org/ to update their leaderboard for resolution, but this resolves as YES surely.

I have updated the title and description to better clarify how this market regards the models im-a-good-gpt2-chatbot, im-also-a-good-gpt2-chatbot, and any future "gpt2-chatbot" model, and how to apply the resolution criteria to them.

There are two new models now in LMSYS (battle mode): im-a-good-gpt2-chatbot and im-also-a-good-gpt2-chatbot.

A tweet from Sam Altman from days before mentions one of them, "im-a-good-gpt2-chatbot." This still doesn't count as official confirmation by OpenAI, but it is very likely that they are indeed OpenAI models.

Regarding this market, unless we have proof that one of the two is not based on gpt2-chatbot, YES and NO criteria apply to both at the same time as if the two were the same model; the two must comply with either for resolution. Both need to be confirmed by OpenAI, and have a higher ELO than gpt-4-turbo-2024-04-09 for a YES resolution. Both need to be denied as an OpenAI model/claimed with evidence by another organization for a NO resolution. I will update the market description and title to state this in the following hours if the general consensus is that both models count as gpt2-chatbot (or if only one, to that).

@Peter1169 How would you judge if these two models also vanish, but then yet more new ones pop up? Treating them all as the same?

@Joshua Yep, as long as there is something that induces belief that the models are based on or from the gpt2-chatbot. This might include any statement from Sam Altman, OpenAI, or other reliable sources. In this case, it is easy because the model's name contains "gpt2-chatbot". If the names were completely different, it would depend on whether we have any more information, or if the community regards them as the same or a similar level. Thus, for N number of models regarded as "gpt2-chatbot", the resolution criteria will require all these N models to comply in order to resolve either YES or NO (or just resolve as NO past the deadline).

If Sam Altman or OpenAI explicitly states that a certain model is not "gpt2-chatbot" or is a much improved version (like a gpt2-2-chatbot that is more akin to GPT-5 instead of the GPT-4/4.5 level of current gpt2-chatbot models), I will regard that model as not a "gpt2-chatbot" and not consider it for this market.

More related questions