Will OpenAI's o1 / 🍓 reach #1 on LMSys on October 1st?
➕
Plus
56
Ṁ96k
Oct 3
96%
chance

OpenAI released its o1 model to much fanfare.

https://deepnewz.com/ai/openai-unveils-o1-ai-model-advanced-reasoning-fact-checking-phd-level

LMSys has already announced that these models will be scored on LMSys and will soon appear on leaderboards

The current LMSys leaderboard is headed by GPT-4o-08-08, followed by Gemini and Grok.

https://lmarena.ai/?leaderboard

Will OpenAI's o1 get to #1 on this leaderboard by October 1st?

Several caveats since LMSys is weird...

  • We will look whatever is posted on October 1st

  • If an update happens that day, we will count it [so resolves October 2nd]

  • We use Eastern Time not "updated on" time on LMSys site -- which will often be 7+ days behind....

  • We will use any OpenAI o1 style model and take the best result

  • This will probably be "o1-preview" but if they post a better model that will also count

  • If no o1 model is released by October 1st we will wait until one is posted and extend the market.

  • As usual, statistical ties count! The market is "will o1 (or any best OpenAI model) be first or tied for first on LMSys?


But in most scenarios we will resolve this on October 2nd.

We also have a market betting on the model's ELO.
https://manifold.markets/Moscow25/what-elo-will-openais-o1-model-get

Get Ṁ1,000 play money
Sort by:

I'm surprised people are buying No at 95% now. I don't think there's any other model about to come out, right?

bought Ṁ50 NO

@yetforever ~10% that either Gemini 2 or Claude 3.5 Opus comes out by September 25 imo, roughly when they'd need to to show up on the leaderboard.

12 days is meaningfully long in AI. E.g. most 100-day periods this year have had significant releases.

@yetforever I think Gemini can beat this ELO. But not unless they have been working on a similar approach for a while. Same for Claude.

The idea has been out there.... but OpenAI clearly beat everyone to doing it first.


But I agree there's some risk!

I don't think any model like this will come out in September, but I think you can probably get a better arena score without o1's specific type of training just by scaling more, especially considering o1 probably wasn't optimized specifically for getting the highest chatbot arena score

the model entered at #1 as expected

unless a new model enters the arena and beats O1... this seems very likely to resolve YES in two week

But isn't it slow? People will know a model is o1 if it takes a long time?

bought Ṁ250 YES

@StellarSerene Underrated reason for it not reaching #1 is how much response time factors into preference

Terrence Tao also things the model isn't complete garbage...

If no o1 model is released by October 1st

I assume this means “if no ranking for an o1 model is released”, correct?

@yetforever yes obviously

@Moscow25 we will resolve on Oct 2nd but if there is no ranking yet we will extend the market

bought Ṁ200 NO

@Moscow25 FWIW I, like previous markets, think this could be made clearer in the title. Very reasonable as-is to interpret the current one as "No ranking --> NO" if they're reading the description sloppily. But it's clearly there, so not critical. :)

@HenriThunberg yeah I get it

Manifold limits longer titles, and generally these don't do as well 🤷

I'm very clear on the extra details (many are not) though I find no matter what some non-trivial % of people will complain

@Moscow25 Haha I'll shut up 🙃

@HenriThunberg not at all! You're cool and I like the input.

I've tried it different ways and on this point... have concluded that the title needs to be simple, with all details in the story below -- as clear as possible but not to detract from the theme.

That works best for most people and for me personally. Some will always disagree.

You can't please everyone.

When markets don't describe the details... it seems lazy and drives me crazy! But then some will have such long explanations that you also don't see them below the ... 🤷