Will a LLM considerably more powerful than GPT-4 come out in 2023?
closes Dec 31

I'm looking for ideas for how to operationalize this question.

Hopefully the answer will be pretty obvious, but if it's not, my current plan is to set up a poll here on Manifold, or on Twitter. The main problem would be if a model does somewhat better than GPT-4 on most metrics, but that its qualitative behavior is not noticeably better, in which case I'll probably resolve NO.

GPT-4.5 would not count, but a non-GPT-4 LLM that is less powerful than a 2023-produced GPT-4.5 but more powerful than the current GPT-4 would.

PeterWildeford avatar
Peter Wildeford

What does "come out" mean? Does being talked about in a paper count? Does a few Google engineers having access count? Does it need to be widely publicly accessible by everyone? What if it's open to the public but there's a limited alpha waitlist?

firstuserhere avatar
firstuserhere

So models by OpenAI count? Do we wanna measure timelines for models more powerful than GPT-4 by someone who didn't make GPT-4 (and is thus already baselined above everyone else)

DylanSlagh avatar
Dylan Slagh

Why do you say a GPT-4.5 wouldn’t count? GPT-3.5 integrated into ChatGPT was what started all this hype because it was so much better than GPT-3

BionicD0LPH1N avatar

@DylanSlagh Because I’d rather this market not resolve based on naming conventions, and because this was the question I wanted an answer to when creating the market. But I think it makes sense to create an alt market with an alt resolution criterion.

DanielBalchev avatar
Daniel Balchev

From the new models’s paper compare on how many tasks it’s better than gpt-4.

This will def be skewed towards tasks it’s better. But it’s a start (:

