Will Google Gemini perform better (text) than GPT-4?
➕
Plus
46
Ṁ6208
resolved Jan 2
Resolved
YES

"perform better" refers to the text performance only, to keep it simple. To be comparable the performance should be equal or extremely close across a wide range of benchmarks (e.g. MMLU, HumanEval, WinoGrande) and chat/agent tests (e.g. MT-Bench). It should also have at least 8k context length (chosen since GPT-4 has 8k and 32k context length versions).

Of course, to qualify as YES, the group that develops a competitor must publicly announce that they trained an LLM with the benchmark results, or make an API available to external evaluators. If Gemini is released exclusively through a chat interface and the only benchmarks are internal to Google, then this market will resolve N/A because of a lack of sufficient information.

Market will resolve as soon as we can get accurate evaluations for Gemini after it releases. The only situation in which this market should make it to its end date is if Gemini is not released to external evaluators by EOY 2024.

GPT-4's reference results will be the GPT-4 API at the time of Gemini evaluation (i.e. same month). If GPT-4.5 releases, this will not be considered.

Get
Ṁ1,000
and
S3.00

🏅 Top traders

#NameTotal profit
1Ṁ266
2Ṁ249
3Ṁ175
4Ṁ146
5Ṁ139
Sort by:

@hyperion "as soon as we get an accurate evaluation of Gemini" is probably closer to this release. https://blog.google/technology/ai/google-gemini-ai/#performance
The result is the same, so doesn't matter anyway.

@MikhailDoroshenko Ok, maybe not, I remember there being a lot of arguments about CoT@32 vs 5-shot. I don't have any stakes here, so doesn't matter much to me, but makes the resolution debatable imo.

predictedYES

How are you going to handle the multi size release?

If we go by their word,ultra beats gpt-4,but it isn't publicly available....

predictedNO

@array_wake I suggest wait for ultra

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules