(Mostly self-explanatory. To clarify, GPT 4.5 or GPT-5 would count. A new version of GPT-4 with a larger context window won’t)
Related questions
@ms It seems pretty likely that it gets a higher score on at least 1 benchmark and is thus "more capable".
@MiraBot if the benchmark is about something relevant to capabilities, and the model doesn’t get lower scores on other benchmarks, it’s probably “more capable”
I'm confused about why I'm betting against @Mira on 2023. I don't suppose we'd care to state our trading reasons out loud? Mine is just that OpenAI folks denied it, and I don't expect them to tell lies falsified over that short a timescale.
As the second biggest yes holder, my reasoning is:
0) Mira is the biggest yes holder.
1) Ignoring all the rumors and denials, it would make sense as a response to Google claiming Gemini Ultra outperforms GPT-4. It would also line up with OAI announcing a bunch of safety things this last week, which could be them trying to show "balance" between safety and capabilities.
2) My understanding is that these rumors started with the 🍎&🌸 accounts. Other people picked up the rumor, and hype grew because they've been correct about things before. They have not backed down, and instead they've said that OAI is trolling.
3) After the hype started with 🍎&🌸, there was that screenshot posted to Reddit with 4.5 token prices. Then people started asking ChatGPT to identify its own model, and it said it was 4.5. The screenshot is what Sam was asked about when he said "Nah", and Depue said the self-identification is a hallucination. The case for a 2023 4.5 is that those two things can both be fake, while the underlying rumors are based on a real possibility that OAI is planning a holiday release.
Even if I'm right about all of this, there could still be delays of course. I'm hedging in other markets. But I wouldn't put this below 20%.
@Joshua Could be wrong, but I believe most of the rumors originated on Reddit. First there was one person that claimed that ChatGPT read his entire book draft and understood it, therefore having no context window. Then there was the deleted screenshot that purportedly leaked the webpage showing GPT 4.5 modalities and cost. Then people started posting subjective opinions about how ChatGPT became smarter (that was both on twittee and Reddit), and finally there was ChatGPT saying that gpt-4.5-turbo is the api version, which also originated on Reddit. But it's hard to know because everyone reposts everyone.
Anyways, the entire thing seems to me like people circlejerked themselves into mass delusion. I've seen that happen on Reddit more times than I can count.
The stonk market is a meme but I'd be interested in somehow formalizing a way to keep track of how often accounts like this were correct in a market.
Maybe an unlinked market asking "Will [account] make an unambiguously false prediction by market close"? And then you re-add a name after it resolves yes, so you keep track of how often each account said something which turned out to be false.
Open to suggestions.
Well we do that all the time, but it's hard to actually keep track of which "insiders" are reliable what % of the time. If you say enough vague things, eventually one of them is going to be true. And we can't make a market for every single claim they make.
Sam Altman explicitly denies the 4.5 leak:
If an LLM says it’s GPT4.5, it doesn’t mean it’s actually GPT4.5. Even if its prompt says it’s GPT4.5, it doesn’t mean it’s GPT4.5. I’m going to resolve based on official/credible information.
If some benchmarks show that the model available now is more capable than GPT-4 and OpenAI later says they actually released GPT-4.5 without announcing it in 2023, the question resolves in 2023; otherwise, it has to be credible evidence on benchmark performance. Whatever the LLM itself says isn’t as relevant
@JohnOFarrar GPT 4.5 has not released as of today, and if market manager resolved to yes based on Reddit posts relying on GPT telling the truth about itself, I will be contesting that resolution.