[HARD CRITERIA, READ DESC] Will all "gpt2-chatbot" models in LMSYS prove to be new, improved models from OpenAI?
153
2.1k
1.6k
resolved May 16
Resolved
YES
Sam Altman tweet about "i do have a soft spot for gpt2"
Apr 30
Original gpt2-chatbot removed from LMSYS
Apr 30
Sam Altam tweet about "im-a-good-gpt2-chatbot"
May 5
Models im-a-good-gpt2-chatbot and im-also-a-good-gpt2-chatbot are now in LMSYS (battle mode)
May 7
OpenAI announces a live stream on May 13 to demo some ChatGPT and GPT-4 updates.
May 10
OpenAI live stream
May 13

IMPORTANT: Read full criteria. This market has a strict criteria. If all "gpt2-chatbot" models are removed from LMSYS without a reintroduction within 15 days the market resolves as NO. For a less strict criteria you have this market: [SOFT CRITERIA, READ DESC] Will all "gpt2-chatbot" models in LMSYS prove to be new, improved models from OpenAI?

BE AWARE: For the pourpose of this market "gpt2-chatbot" means all models that induces belief that are based on or from the gpt2-chatbot original one (like having the string "gpt2-chatbot" in their names). This might include any statement from Sam Altman, OpenAI, or other reliable sources. If Sam Altman or OpenAI explicitly states that a certain model is not "gpt2-chatbot" or is a much improved version (like a gpt2-2-chatbot that is more akin to GPT-5 instead of the GPT-4/4.5 level of current gpt2-chatbot models), I will regard that model as not a "gpt2-chatbot" and not consider it for this market.

YES and NO criteria apply to all "gpt2-chatbot" models at the same time as if all were the same model. Thus, for N number of models regarded as "gpt2-chatbot", the resolution criteria will require all these N models to comply in order to resolve either YES or NO (or just resolve as NO past the deadline). That means all need to rank in the top 10, be confirmed by OpenAI, and have a higher ELO than gpt-4-turbo-2024-04-09 for a YES resolution. All need to be removed for the countdown of 15 days to start, be denied as an OpenAI model/claimed with evidence by another organization for a NO resolution.

(UTC) April 28: The original gpt2-chatbot, introduced just days earlier, is noticed by the community and gains attention.

(UTC) May 1 Update: gpt2-chatbot was removed from LMSYS. 15 days countdown for a NO resolution starts.

(UTC) May 7 Update: There are two new "gpt2-chatbot" models in LMSYS (battle mode): im-a-good-gpt2-chatbot and im-also-a-good-gpt2-chatbot. 15 days countdown resets and stops.

Current models regarded as "gpt2-chatbot" by this market: im-a-good-gpt2-chatbot, im-also-a-good-gpt2-chatbot

"gpt2-chatbot" models are now available at https://chat.lmsys.org and are reportedly at a SOTA quality level. There is speculation that it might be a shadow drop of a new OpenAI models to test their performance prior to release.

More info from 4chan: https://rentry.org/GPT2

Resolves as YES:

  • If gpt2-chatbot is confirmed by OpenAI as a new model that improves upon GPT-4 or another version such as GPT-4.5/5 or similar.

  • If gpt2-chatbot is a finetuned version of an older GPT-4 model or even an earlier model by OpenAI, provided it is better than the last version of GPT-4 (achieving a higher ELO in the overall category of the Chatbot arena leaderboard than gpt-4-turbo-2024-04-09).

  • Confirmation from OpenAI means either they have explicitly stated it, or they have announced a new model that has been proven to be gpt2-chatbot or a later iteration.

  • It counts even if gpt2-chatbot is renamed or removed from the Chatbot arena and reintroduced officially.

Resolved as NO:

  • If 2024 ends without meeting the YES criteria.

  • If OpenAI denies that gpt2-chatbot is an OpenAI model.

  • If https://chat.lmsys.org states that it is not a model from OpenAI.

  • If another person or organization claims (with evidence) that gpt2-chatbot is from them.

  • If gpt2-chatbot does not reach the top 10 ranks (not ELO, but ranks from 1 to 10) before the end of 15 June 2024 (in UTC time).

  • If gpt2-chatbot is removed from the Chatbot arena and not reintroduced (with that name or another) within 15 days.

OP Trading: Given the objective nature of this market’s resolution, I reserve the right to place bets. However, I will do so only after at least 5 trades or trade orders from different traders have been made, to avoid any unfair advantage.

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ905
2Ṁ726
3Ṁ646
4Ṁ464
5Ṁ409
Sort by:

gpt-4o-2024-05-13 is at the top of leaderboard with more ELO than gpt-4-turbo-2024-04-09. Is confirmed that GPT-4o is gpt2-chatbot. I will update the description of market with the details later, but the resolution is YES.

GPT-4o-2024-05-13 is now on the leaderboard at the top.

@CharlesFoster Yep, this will definitely be resolved as YES. But as per the criteria stated, I will wait for the resolution until the overall leaderboard at https://chat.lmsys.org/ is updated (right now, the gpt2-chatbot models aren't showing).

I thought it needs to apply to both chatbots, but we only have confirmation on one now, correct?

@ShadowyZephyr

Organization: OpenAI (in all gpt2-chatbots) and all are above gpt-4-turbo-2024-04-09

I will wait for https://chat.lmsys.org/ to update their leaderboard for resolution, but this resolves as YES surely.

resolves yes imo

+1 for this

bought Ṁ16 YES

I have updated the title and description to better clarify how this market regards the models im-a-good-gpt2-chatbot, im-also-a-good-gpt2-chatbot, and any future "gpt2-chatbot" model, and how to apply the resolution criteria to them.

bought Ṁ30 YES

It's back

bought Ṁ80 YES

@RemNi I don’t have it, is this fud? Can you send SS? I assume since Sam tweetes it it will go up soon

https://twitter.com/sama/status/1787222050589028528

There are two new models now in LMSYS (battle mode): im-a-good-gpt2-chatbot and im-also-a-good-gpt2-chatbot.

A tweet from Sam Altman from days before mentions one of them, "im-a-good-gpt2-chatbot." This still doesn't count as official confirmation by OpenAI, but it is very likely that they are indeed OpenAI models.

It could be a fake gpt2-chatbot as a meme, but LMSYS works directly with creators to add models to the arena, and they wouldn't allow someone to make this type of joke name for a model unless it is by the original gpt2-chatbot creators (likely OpenAI).

Both models perform similarly to gpt2-chatbot with similar behavior. One of them may be the original gpt2-chatbot or an improved/fine-tuned/more trained iteration.

In any case, the naming of these models allowed to be in the arena by LMSYS counts as a reintroduction of gpt2-chatbot.

I'm going to wait for a few more hours to see if we get more info about these models, from either official statements or regarding how they work by users. After that, if there is no evidence these models are not from the original creators of gpt2-chatbot, I will stop the countdown to May 14 and update the title accordingly (this time with better wording to make traders aware of the strict criteria of this market).

Even if the countdown is reset and stopped, if both models are removed from LMSYS it will start again (15 days to reintroduce or market resolves as NO). The requirements for reaching the top 10 ranks before June 15, 2024 (in UTC time), official confirmation by OpenAI, and achieving a higher ELO in the overall category of the Chatbot arena leaderboard than gpt-4-turbo-2024-04-09 are still needed for a YES resolution.

Regarding this market, unless we have proof that one of the two is not based on gpt2-chatbot, YES and NO criteria apply to both at the same time as if the two were the same model; the two must comply with either for resolution. Both need to rank in the top 10, be confirmed by OpenAI, and have a higher ELO than gpt-4-turbo-2024-04-09 for a YES resolution. Both need to be removed for the countdown, be denied as an OpenAI model/claimed with evidence by another organization for a NO resolution. I will update the market description and title to state this in the following hours if the general consensus is that both models count as gpt2-chatbot (or if only one, to that).

New market with different and more flexible criteria (deadline set for the end of September 2024).

gpt2-chatbot was removed from LMSYS. Taking April 30 in UTC time as the day it was removed and counting 15 days from then, this market will be resolved as NO if gpt2-chatbot (by that name or any other, as long as we can be certain it is the same model) is not reintroduced in the arena by the end of May 14 in UTC.

@Peter1169 it looks like I am likely to profit on this because I'm the first one that saw you post this, but this is an unfortunate situation. If we could go back in time and remove this clause from the criteria, I think that would be good.

In fact I think this situation is major enough that I'd like to temporarily close trading until we can at least update the title. But maybe there's some way for this market to not just turn into a question about when GPT2 will be re-added?

Ah, the site is lagging and not letting us close it 😅

Okay now it worked. If that is the only way to interpret the criteria, then I suppose the title should now be:

"Will the gpt2-chatbt be reintroduced to lmsys before May 14th, and then proven to be a new improved OpenAI model?"

Unless anyone has a better suggestion?

@Joshua we should set up a pool and ask traders if they agree to remove the "removed from the Chatbot arena" condition from the NO criteria?

Also, the "If the gpt2-chatbot does not reach the top 10 ranks (not ELO, but ranks from 1 to 10) before the end of June 15, 2024 (UTC time)".

@Peter1169 I think that's worth considering! I'm not sure it even needs to be a poll, you could just ask if anyone objects to removing those criteria.

I bet no after seeing your comment, but I personally wouldn't object because I want the question to be the best version of itself even if it hurts my profit in expectation

You could also just make a new version of this question now without those criteria, so people can start trading while we sort this out. Maybe with an end date of September or October?

I'm not in favour of removing the clause.

It's good to not change criteria unless they are really perverse. In this case the clause is functioning as "resolves NO if no news for 14 days". That's shorter than people might have guessed, but is not a crazy thing to have in resolution criteria. And although polls can be useful for gauging opinion, I don't think they're appropriate for actually deferring to to make decisions like this, as many people will simply vote in their financial interest which detracts from how meaningful the results are.

I have been betting on it (small amounts, doesn't really matter), it's not the case that the above comment was the first it was brought to anyone's attention.

The main problem is the misleading title IMHO. I would just update the title to include "resolves NO if no news for 14 days" if you can figure out how to squeeze it in.

if both clauses were removed, what would the time limit be? How to price markets like these depends a lot on what deadlines there are - it's not great if the effective deadline gets extended from 2 weeks to most of a year, that's a pretty big change.

@chrisjbillington Fair points, and agreed that a poll is bad. I think a single objection like yours is enough to say that the criteria shouldn't be changed.

And yeah, the entire year is too long for a question like this so the clause does make some sense, although if I were making this question from scratch and keeping that clause I'd probably set to 30 or 60 days.

@Joshua @chrisjbillington The best course of action is to update the title and keep this market as it is. As soon as I have 100 more Mana, I will create another one, similar but with different criteria (limited, for example, to September).

@Peter1169 lol I'll send you the 100 mana

@Joshua Thanks!

@Joshua I made a new market with different criteria. This one will remain unchanged, and the title has been updated to clearly state that it will be resolved as NO if the gpt2-chatbot is not reintroduced before the deadline. Should I open trading again now?

@Peter1169 Seems good to me yeah

@Joshua Should have read the resolution criteria more closely. Will probably lose all my mana on this. Oh well.

@M3465 Sorry. I'd like to apologize to traders for the previous misleading title, Manifold will try to do better and clarify markets/titles earlier going forward.

More related questions