IMPORTANT: Read full criteria. This market has a soft criteria. For a more strict criteria you have this market: [HARD CRITERIA, READ DESC] Will all "gpt2-chatbot" models in LMSYS prove to be new, improved models from OpenAI?
BE AWARE: For the pourpose of this market "gpt2-chatbot" means all models that induces belief that are based on or from the gpt2-chatbot original one (like having the string "gpt2-chatbot" in their names). This might include any statement from Sam Altman, OpenAI, or other reliable sources. If Sam Altman or OpenAI explicitly states that a certain model is not "gpt2-chatbot" or is a much improved version (like a gpt2-2-chatbot that is more akin to GPT-5 instead of the GPT-4/4.5 level of current gpt2-chatbot models), I will regard that model as not a "gpt2-chatbot" and not consider it for this market.
YES and NO criteria apply to all "gpt2-chatbot" models at the same time as if all were the same model. Thus, for N number of models regarded as "gpt2-chatbot", the resolution criteria will require all these N models to comply in order to resolve either YES or NO (or just resolve as NO past the deadline). That means all need to rank in the top 10, be confirmed by OpenAI, and have a higher ELO than gpt-4-turbo-2024-04-09 for a YES resolution. All need to be denied as an OpenAI model/claimed with evidence by another organization for a NO resolution.
(UTC) April 28: The original gpt2-chatbot, introduced just days earlier, is noticed by the community and gains attention.
(UTC) May 1 Update: gpt2-chatbot was removed from LMSYS.
(UTC) May 7 Update: There are two new "gpt2-chatbot" models in LMSYS (battle mode): im-a-good-gpt2-chatbot and im-also-a-good-gpt2-chatbot.
(UTC) May 16 Update: gpt-4o-2024-05-13 is now in the leaderboard with more ELO than gpt-4-turbo-2024-04-09. gpt-4o-2024-05-13 is confirmed to be gpt2-chatbot.
Current models regarded as "gpt2-chatbot" by this market: gpt-4o-2024-05-13
"gpt2-chatbot" models are now available at https://chat.lmsys.org and are reportedly at a SOTA quality level. There is speculation that it might be a shadow drop of a new OpenAI models to test their performance prior to release.
More info from 4chan: https://rentry.org/GPT2
Resolves as YES:
If gpt2-chatbot is confirmed by OpenAI as a new model that improves upon GPT-4 or another version such as GPT-4.5/5 or similar. ✔️
If gpt2-chatbot is a finetuned version of an older GPT-4 model or even an earlier model by OpenAI, provided it is better than the last version of GPT-4 (achieving a higher ELO in the overall category of the Chatbot arena leaderboard than gpt-4-turbo-2024-04-09). ✔️
Confirmation from OpenAI means either they have explicitly stated it, or they have announced a new model that has been proven to be gpt2-chatbot or a later iteration. ✔️
It counts even if gpt2-chatbot is renamed or removed from the Chatbot arena and reintroduced officially. ✔️
Resolves as NO:
If September 2024 ends without meeting the YES criteria.
If OpenAI denies that gpt2-chatbot is an OpenAI model.
If https://chat.lmsys.org states that it is not a model from OpenAI.
If another person or organization claims (with evidence) that gpt2-chatbot is from them.
OP Trading: Given the objective nature of this market’s resolution, I reserve the right to place bets. However, I will do so only after at least 5 trades or trade orders from different traders have been made, to avoid any unfair advantage.
RESOLVED as YES (May 16, 2024):
I will wait for https://chat.lmsys.org/ to update their leaderboard for resolution, but this resolves as YES surely.
There are two new models now in LMSYS (battle mode): im-a-good-gpt2-chatbot and im-also-a-good-gpt2-chatbot.
A tweet from Sam Altman from days before mentions one of them, "im-a-good-gpt2-chatbot." This still doesn't count as official confirmation by OpenAI, but it is very likely that they are indeed OpenAI models.
Regarding this market, unless we have proof that one of the two is not based on gpt2-chatbot, YES and NO criteria apply to both at the same time as if the two were the same model; the two must comply with either for resolution. Both need to be confirmed by OpenAI, and have a higher ELO than gpt-4-turbo-2024-04-09 for a YES resolution. Both need to be denied as an OpenAI model/claimed with evidence by another organization for a NO resolution. I will update the market description and title to state this in the following hours if the general consensus is that both models count as gpt2-chatbot (or if only one, to that).
@Peter1169 How would you judge if these two models also vanish, but then yet more new ones pop up? Treating them all as the same?
@Joshua Yep, as long as there is something that induces belief that the models are based on or from the gpt2-chatbot. This might include any statement from Sam Altman, OpenAI, or other reliable sources. In this case, it is easy because the model's name contains "gpt2-chatbot". If the names were completely different, it would depend on whether we have any more information, or if the community regards them as the same or a similar level. Thus, for N number of models regarded as "gpt2-chatbot", the resolution criteria will require all these N models to comply in order to resolve either YES or NO (or just resolve as NO past the deadline).
If Sam Altman or OpenAI explicitly states that a certain model is not "gpt2-chatbot" or is a much improved version (like a gpt2-2-chatbot that is more akin to GPT-5 instead of the GPT-4/4.5 level of current gpt2-chatbot models), I will regard that model as not a "gpt2-chatbot" and not consider it for this market.