
I'm looking for ideas for how to operationalize this question.
Hopefully the answer will be pretty obvious, but if it's not, my current plan is to set up a poll here on Manifold, or on Twitter. The main problem would be if a model does somewhat better than GPT-4 on most metrics, but that its qualitative behavior is not noticeably better, in which case I'll probably resolve NO.
GPT-4.5 would not count, but a non-GPT-4 LLM that is less powerful than a 2023-produced GPT-4.5 but more powerful than the current GPT-4 would.
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ1,055 | |
2 | Ṁ263 | |
3 | Ṁ158 | |
4 | Ṁ97 | |
5 | Ṁ86 |
@firstuserhere In my experience yes it is considerably better, the extended context windows makes it specially good for long complex prompts.
@JoaoPedroSantos From what I've seen, the chances of Gemini being an improvement on GPT4 are slim if it debuts in 2023. Long, complicated instructions work particularly well with the larger context windows.
This is a major arb opportunity with /YoavTzfati/will-gemini-be-widely-considered-be & /brubsby/will-googles-gemini-model-be-releas
79% * 55% = 36%. Much higher than 23%, and that's just one model.
@ShadowyZephyr Note the resolution difference though, the Gemini market just requires it to beat GPT-4 on metrics, this one specifically says metrics are not good enough and it has to be qualitatively better.
@ErickBall Those two are pretty much the same if you use a variety of correct benchmarks, like MMLU, BBH, etc. The reason we have things like Alpaca being considered as good as ChatGPT is because the benchmarks are cherry-picked.
@ShadowyZephyr Maybe they are not independent, that if Gemini release in 2023, then it is much less likely to be better than GPT4. So you can't simply multiply these two
@HanchiSun Eggy Car, From what I've seen, the chances of Gemini being an improvement on GPT4 are slim if it debuts in 2023. Long, complicated instructions work particularly well with the larger context windows.
@PeterWildeford Thanks for asking! I should have specified. For the purposes of this question, ‘come out’ means be available to some members of the public. It needn’t be widely publicly accessible, and a limited beta is enough. If it’s only accessible to members of the research team that made it, and to collaborators, it doesn’t count. Only being talked about in a paper without anyone of the wider public having access doesn’t count either.
@DylanSlagh Because I’d rather this market not resolve based on naming conventions, and because this was the question I wanted an answer to when creating the market. But I think it makes sense to create an alt market with an alt resolution criterion.
@BionicD0LPH1N @DylanSlagh ChatGPT was never based on GPT-3. When it came out, it was already a version of GPT-3.5, and I think everyone agrees it was the user interface and marketing that kicked off the public interest, moreso than marginal capabilities.
@BionicD0LPH1N @DylanSlagh and I would agree that allowing GPT-4.5 to constitute a YES resolution would make this market mostly about naming conventions, since surely the latest version of GPT-4 will be "considerably more powerful" than the original by the end of 2023, so it's just a matter of whether OpenAI decides to call it GPT-4.5.