When will OpenAI release a more capable LLM?
Dec 31
First half of 2024
Second half of 2024

(Mostly self-explanatory. To clarify, GPT 4.5 or GPT-5 would count. A new version of GPT-4 with a larger context window won’t)

To clarify further: to count as more capable, the LLM should be able to be better across benchmarks relevant to capabilities while not performing worse on some benchmarks relevant to capabilities.

Get Ṁ600 play money
Sort by:
bought Ṁ100 Second half of 2024 YES

@ms Looks like a strong case for the second half of the year: https://www.axios.com/2024/05/13/openai-google-chatgpt-ai

4.5 or 5.0 are forthcoming, and OpenAI does not consider this to be the next level yet.

Unfortunate that the market maker is untaggable. The system could use some work.

@ms you need to click on something in the dropdown to create a mention

@jacksonpolack Yeah the dropdown wasn't including him on mobile. Happens often with very short @ names.

Seems this market is at a coin-toss wether GPT-4o counts or not.

Since GPT-4o has modalities that GPT-4 does not, that's arguably a "more capable" model, even if some benchmarks might perform slightly worse (as they inevitably will).

bought Ṁ10 2025+ YES

If GPT-4o doesn't cut it, then 2025+ is massively underpriced imo.

bought Ṁ450 First half of 2024 YES

It is clearly doing better in most important benchmarks

@Sss19971997 Unfortunately it appears to be doing worse on one and therefor the market maker will likely say it doesn’t count

@Panfilo That seems like a weird interpretation to me. Just because it performs slightly worse (could be a insignificant difference) on a single benchmark that shouldn't make this resolve NO, imo.

@ErikBjareholt The negative condition at the end of the description would indicate it’s a No, and similar markets have started resolving (though of course phrasing matters).

@Panfilo I'm not super sure how these markets should resolve but oh God let's please not cite Jim's decisions as if they should be precedent

@ms So what do you think of GPT-4o?

@Joshua I probably want to wait to see the capabilities improvement (via benchmarks). Human preference alone and anecdotal evidence from people who’ve interacted with it via the arena isn’t sufficient, have to wait for a confirmation it can solve more tasks, will resolve once we see that. If resolves, this probably shouldn’t take long?

@ms Agreed, that seems like the best approach to me. I'm still not sure what to think.

bought Ṁ750 First half of 2024 YES

@Joshua If it's more than just a larger context window, it should count based on the description.

@Panfilo if there are things that work together with the LLM (such as voice), it doesn’t count. The LLM has to perform better across benchmarks relevant to capabilities and not perform worse on some benchmarks. I previously clarified this in the comments but didn’t put it in the description, sorry.

@ms Are you saying 4o has to outperform 4T on every single benchmark? Or would doing so on a large majority be enough to count as more capable? The fact that it's half the cost and twice as fast must come with some drawbacks.

@ms Benchmarks can be found here: https://openai.com/index/hello-gpt-4o/

It does better than GPT-4Turbo on every benchmark shown except DROP. How does that resolve?

Reminder: it has to be more capable; if it’s just faster and has a large context window, that doesn’t count.

bought Ṁ25 2025+ YES

@ms It seems pretty likely that it gets a higher score on at least 1 benchmark and is thus "more capable".

@MiraBot if the benchmark is about something relevant to capabilities, and the model doesn’t get lower scores on other benchmarks, it’s probably “more capable”

sold Ṁ316 Second half of 2024 NO

I made a market with fortnightly options to see if anyone wants to bet on something more specific than before/after July:

Apples only ever claimed the December release was "potential", so any small delay could swing these markets based on the end of the year. If you think the rumors are believable but a delay is too likely for you to bet here, come bet on this market that allows until the end of Q1 2024:

bought Ṁ1,000 of 2023 NO

I'm confused about why I'm betting against @Mira on 2023. I don't suppose we'd care to state our trading reasons out loud? Mine is just that OpenAI folks denied it, and I don't expect them to tell lies falsified over that short a timescale.

bought Ṁ20 of 2023 YES


As the second biggest yes holder, my reasoning is:

0) Mira is the biggest yes holder.

1) Ignoring all the rumors and denials, it would make sense as a response to Google claiming Gemini Ultra outperforms GPT-4. It would also line up with OAI announcing a bunch of safety things this last week, which could be them trying to show "balance" between safety and capabilities.

2) My understanding is that these rumors started with the 🍎&🌸 accounts. Other people picked up the rumor, and hype grew because they've been correct about things before. They have not backed down, and instead they've said that OAI is trolling.

3) After the hype started with 🍎&🌸, there was that screenshot posted to Reddit with 4.5 token prices. Then people started asking ChatGPT to identify its own model, and it said it was 4.5. The screenshot is what Sam was asked about when he said "Nah", and Depue said the self-identification is a hallucination. The case for a 2023 4.5 is that those two things can both be fake, while the underlying rumors are based on a real possibility that OAI is planning a holiday release.

Even if I'm right about all of this, there could still be delays of course. I'm hedging in other markets. But I wouldn't put this below 20%.

@Joshua Also, maybe to make your (0) explicit, @Mira has at least some history of having insider information and betting on it like this (and thereby taking my money!), so I put more weight on it.

@Joshua Could be wrong, but I believe most of the rumors originated on Reddit. First there was one person that claimed that ChatGPT read his entire book draft and understood it, therefore having no context window. Then there was the deleted screenshot that purportedly leaked the webpage showing GPT 4.5 modalities and cost. Then people started posting subjective opinions about how ChatGPT became smarter (that was both on twittee and Reddit), and finally there was ChatGPT saying that gpt-4.5-turbo is the api version, which also originated on Reddit. But it's hard to know because everyone reposts everyone.

Anyways, the entire thing seems to me like people circlejerked themselves into mass delusion. I've seen that happen on Reddit more times than I can count.

More related questions