[SOFT CRITERIA, READ DESC] Will all "gpt2-chatbot" models in LMSYS prove to be new, improved models from OpenAI?
30
452Ṁ2615
resolved May 16
Resolved
YES
Sam Altam tweet about "im-a-good-gpt2-chatbot"
Models im-a-good-gpt2-chatbot and im-also-a-good-gpt2-chatbot are now in LMSYS (battle mode)
OpenAI announces a live stream on May 13 to demo some ChatGPT and GPT-4 updates.
OpenAI live stream

IMPORTANT: Read full criteria. This market has a soft criteria. For a more strict criteria you have this market: [HARD CRITERIA, READ DESC] Will all "gpt2-chatbot" models in LMSYS prove to be new, improved models from OpenAI?

BE AWARE: For the pourpose of this market "gpt2-chatbot" means all models that induces belief that are based on or from the gpt2-chatbot original one (like having the string "gpt2-chatbot" in their names). This might include any statement from Sam Altman, OpenAI, or other reliable sources. If Sam Altman or OpenAI explicitly states that a certain model is not "gpt2-chatbot" or is a much improved version (like a gpt2-2-chatbot that is more akin to GPT-5 instead of the GPT-4/4.5 level of current gpt2-chatbot models), I will regard that model as not a "gpt2-chatbot" and not consider it for this market.

YES and NO criteria apply to all "gpt2-chatbot" models at the same time as if all were the same model. Thus, for N number of models regarded as "gpt2-chatbot", the resolution criteria will require all these N models to comply in order to resolve either YES or NO (or just resolve as NO past the deadline). That means all need to rank in the top 10, be confirmed by OpenAI, and have a higher ELO than gpt-4-turbo-2024-04-09 for a YES resolution. All need to be denied as an OpenAI model/claimed with evidence by another organization for a NO resolution.

(UTC) April 28: The original gpt2-chatbot, introduced just days earlier, is noticed by the community and gains attention.

(UTC) May 1 Update: gpt2-chatbot was removed from LMSYS.

(UTC) May 7 Update: There are two new "gpt2-chatbot" models in LMSYS (battle mode): im-a-good-gpt2-chatbot and im-also-a-good-gpt2-chatbot.

(UTC) May 16 Update: gpt-4o-2024-05-13 is now in the leaderboard with more ELO than gpt-4-turbo-2024-04-09. gpt-4o-2024-05-13 is confirmed to be gpt2-chatbot.

Current models regarded as "gpt2-chatbot" by this market: gpt-4o-2024-05-13

"gpt2-chatbot" models are now available at https://chat.lmsys.org and are reportedly at a SOTA quality level. There is speculation that it might be a shadow drop of a new OpenAI models to test their performance prior to release.

More info from 4chan: https://rentry.org/GPT2

Resolves as YES:

  • If gpt2-chatbot is confirmed by OpenAI as a new model that improves upon GPT-4 or another version such as GPT-4.5/5 or similar. ✔️

  • If gpt2-chatbot is a finetuned version of an older GPT-4 model or even an earlier model by OpenAI, provided it is better than the last version of GPT-4 (achieving a higher ELO in the overall category of the Chatbot arena leaderboard than gpt-4-turbo-2024-04-09). ✔️

  • Confirmation from OpenAI means either they have explicitly stated it, or they have announced a new model that has been proven to be gpt2-chatbot or a later iteration. ✔️

  • It counts even if gpt2-chatbot is renamed or removed from the Chatbot arena and reintroduced officially. ✔️

Resolves as NO:

  • If September 2024 ends without meeting the YES criteria.

  • If OpenAI denies that gpt2-chatbot is an OpenAI model.

  • If https://chat.lmsys.org states that it is not a model from OpenAI.

  • If another person or organization claims (with evidence) that gpt2-chatbot is from them.

OP Trading: Given the objective nature of this market’s resolution, I reserve the right to place bets. However, I will do so only after at least 5 trades or trade orders from different traders have been made, to avoid any unfair advantage.

RESOLVED as YES (May 16, 2024):

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ73
2Ṁ60
3Ṁ42
4Ṁ40
5Ṁ25
© Manifold Markets, Inc.TermsPrivacy