Will a LLM considerably more powerful than GPT-4 come out in 2023?
126
1K
1.6K
resolved Jan 2
Resolved
NO

I'm looking for ideas for how to operationalize this question.

Hopefully the answer will be pretty obvious, but if it's not, my current plan is to set up a poll here on Manifold, or on Twitter. The main problem would be if a model does somewhat better than GPT-4 on most metrics, but that its qualitative behavior is not noticeably better, in which case I'll probably resolve NO.

GPT-4.5 would not count, but a non-GPT-4 LLM that is less powerful than a 2023-produced GPT-4.5 but more powerful than the current GPT-4 would.

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ1,055
2Ṁ263
3Ṁ158
4Ṁ97
5Ṁ86
Sort by:
predicted NO

Resolves NO @BionicD0LPH1N

@BionicD0LPH1N
Would GPT-4 turbo count as a new LLM? it seems to be much better than GPT-4.

predicted YES
bought Ṁ100 NO from 22% to 21%

@JoaoPedroSantos

it seems to be much better than GPT-4.

really?

@firstuserhere In my experience yes it is considerably better, the extended context windows makes it specially good for long complex prompts.

@JoaoPedroSantos From what I've seen, the chances of Gemini being an improvement on GPT4 are slim if it debuts in 2023. Long, complicated instructions work particularly well with the larger context windows.

bought Ṁ50 of YES

This is a major arb opportunity with /YoavTzfati/will-gemini-be-widely-considered-be & /brubsby/will-googles-gemini-model-be-releas
79% * 55% = 36%. Much higher than 23%, and that's just one model.

bought Ṁ45 of YES

@ShadowyZephyr Note the resolution difference though, the Gemini market just requires it to beat GPT-4 on metrics, this one specifically says metrics are not good enough and it has to be qualitatively better.

predicted YES

@ErickBall Those two are pretty much the same if you use a variety of correct benchmarks, like MMLU, BBH, etc. The reason we have things like Alpaca being considered as good as ChatGPT is because the benchmarks are cherry-picked.

@ShadowyZephyr Maybe they are not independent, that if Gemini release in 2023, then it is much less likely to be better than GPT4. So you can't simply multiply these two

@HanchiSun Eggy Car, From what I've seen, the chances of Gemini being an improvement on GPT4 are slim if it debuts in 2023. Long, complicated instructions work particularly well with the larger context windows.

What does "come out" mean? Does being talked about in a paper count? Does a few Google engineers having access count? Does it need to be widely publicly accessible by everyone? What if it's open to the public but there's a limited alpha waitlist?

@PeterWildeford Thanks for asking! I should have specified. For the purposes of this question, ‘come out’ means be available to some members of the public. It needn’t be widely publicly accessible, and a limited beta is enough. If it’s only accessible to members of the research team that made it, and to collaborators, it doesn’t count. Only being talked about in a paper without anyone of the wider public having access doesn’t count either.

bought Ṁ2 of NO

So models by OpenAI count? Do we wanna measure timelines for models more powerful than GPT-4 by someone who didn't make GPT-4 (and is thus already baselined above everyone else)

predicted YES

Why do you say a GPT-4.5 wouldn’t count? GPT-3.5 integrated into ChatGPT was what started all this hype because it was so much better than GPT-3

@DylanSlagh Because I’d rather this market not resolve based on naming conventions, and because this was the question I wanted an answer to when creating the market. But I think it makes sense to create an alt market with an alt resolution criterion.

predicted NO

@BionicD0LPH1N @DylanSlagh ChatGPT was never based on GPT-3. When it came out, it was already a version of GPT-3.5, and I think everyone agrees it was the user interface and marketing that kicked off the public interest, moreso than marginal capabilities.

predicted NO

@BionicD0LPH1N @DylanSlagh and I would agree that allowing GPT-4.5 to constitute a YES resolution would make this market mostly about naming conventions, since surely the latest version of GPT-4 will be "considerably more powerful" than the original by the end of 2023, so it's just a matter of whether OpenAI decides to call it GPT-4.5.

bought Ṁ30 of YES

From the new models’s paper compare on how many tasks it’s better than gpt-4.

This will def be skewed towards tasks it’s better. But it’s a start (: