Will a LLM considerably more powerful than GPT-4 come out in 2023?

126

1.6kṀ29k

resolved Jan 2

Resolved

ALL

I'm looking for ideas for how to operationalize this question.

Hopefully the answer will be pretty obvious, but if it's not, my current plan is to set up a poll here on Manifold, or on Twitter. The main problem would be if a model does somewhat better than GPT-4 on most metrics, but that its qualitative behavior is not noticeably better, in which case I'll probably resolve NO.

GPT-4.5 would not count, but a non-GPT-4 LLM that is less powerful than a 2023-produced GPT-4.5 but more powerful than the current GPT-4 would.

LLMs

GPT-4

New Year's Resolutions 2024

— LLM & AI Capabilities—

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ1,055
2		Ṁ263
3		Ṁ158
4		Ṁ97
5		Ṁ86

People are also trading

Will xAI develop a more capable LLM than GPT-5 before 2026

68% chance

China will make a LLM approximately as good or better than GPT4 before 2025

89% chance

Will an open-source fully functional Auto-GPT like LLM exist by the end of 2025?

90% chance

When will an open-source LLM be released with a better performance than GPT-4?

How much time will pass between an LLM being released that beats GPT4 and the next OpenAI LLM being released? (+ANSWERS)

20 Comments

96 Holders

367 Trades

Sort by:

predictedNO

Resolves NO @BionicD0LPH1N

@BionicD0LPH1N
Would GPT-4 turbo count as a new LLM? it seems to be much better than GPT-4.

predictedYES

@JoaoPedroSantos No

@JoaoPedroSantos

it seems to be much better than GPT-4.

really?

@firstuserhere In my experience yes it is considerably better, the extended context windows makes it specially good for long complex prompts.

@JoaoPedroSantos From what I've seen, the chances of Gemini being an improvement on GPT4 are slim if it debuts in 2023. Long, complicated instructions work particularly well with the larger context windows.

This is a major arb opportunity with /YoavTzfati/will-gemini-be-widely-considered-be & /brubsby/will-googles-gemini-model-be-releas
79% * 55% = 36%. Much higher than 23%, and that's just one model.

@ShadowyZephyr Note the resolution difference though, the Gemini market just requires it to beat GPT-4 on metrics, this one specifically says metrics are not good enough and it has to be qualitatively better.

predictedYES

@ErickBall Those two are pretty much the same if you use a variety of correct benchmarks, like MMLU, BBH, etc. The reason we have things like Alpaca being considered as good as ChatGPT is because the benchmarks are cherry-picked.

@ShadowyZephyr Maybe they are not independent, that if Gemini release in 2023, then it is much less likely to be better than GPT4. So you can't simply multiply these two

@HanchiSun Eggy Car, From what I've seen, the chances of Gemini being an improvement on GPT4 are slim if it debuts in 2023. Long, complicated instructions work particularly well with the larger context windows.

What does "come out" mean? Does being talked about in a paper count? Does a few Google engineers having access count? Does it need to be widely publicly accessible by everyone? What if it's open to the public but there's a limited alpha waitlist?

@PeterWildeford Thanks for asking! I should have specified. For the purposes of this question, ‘come out’ means be available to some members of the public. It needn’t be widely publicly accessible, and a limited beta is enough. If it’s only accessible to members of the research team that made it, and to collaborators, it doesn’t count. Only being talked about in a paper without anyone of the wider public having access doesn’t count either.

So models by OpenAI count? Do we wanna measure timelines for models more powerful than GPT-4 by someone who didn't make GPT-4 (and is thus already baselined above everyone else)

predictedYES

Why do you say a GPT-4.5 wouldn’t count? GPT-3.5 integrated into ChatGPT was what started all this hype because it was so much better than GPT-3

@DylanSlagh Because I’d rather this market not resolve based on naming conventions, and because this was the question I wanted an answer to when creating the market. But I think it makes sense to create an alt market with an alt resolution criterion.

predictedNO

@BionicD0LPH1N @DylanSlagh ChatGPT was never based on GPT-3. When it came out, it was already a version of GPT-3.5, and I think everyone agrees it was the user interface and marketing that kicked off the public interest, moreso than marginal capabilities.

predictedNO

@BionicD0LPH1N @DylanSlagh and I would agree that allowing GPT-4.5 to constitute a YES resolution would make this market mostly about naming conventions, since surely the latest version of GPT-4 will be "considerably more powerful" than the original by the end of 2023, so it's just a matter of whether OpenAI decides to call it GPT-4.5.

From the new models’s paper compare on how many tasks it’s better than gpt-4.

This will def be skewed towards tasks it’s better. But it’s a start (:

Comment hidden

People are also trading

Will xAI develop a more capable LLM than GPT-5 before 2026

68% chance

China will make a LLM approximately as good or better than GPT4 before 2025

89% chance

Will an open-source fully functional Auto-GPT like LLM exist by the end of 2025?

90% chance

When will an open-source LLM be released with a better performance than GPT-4?

How much time will pass between an LLM being released that beats GPT4 and the next OpenAI LLM being released? (+ANSWERS)

🏅 Top traders

People are also trading

People are also trading

Related questions