Will Google Gemini perform better (text) than GPT-4?

1kṀ6208

resolved Jan 2

Resolved

YES

ALL

"perform better" refers to the text performance only, to keep it simple. To be comparable the performance should be equal or extremely close across a wide range of benchmarks (e.g. MMLU, HumanEval, WinoGrande) and chat/agent tests (e.g. MT-Bench). It should also have at least 8k context length (chosen since GPT-4 has 8k and 32k context length versions).

Of course, to qualify as YES, the group that develops a competitor must publicly announce that they trained an LLM with the benchmark results, or make an API available to external evaluators. If Gemini is released exclusively through a chat interface and the only benchmarks are internal to Google, then this market will resolve N/A because of a lack of sufficient information.

Market will resolve as soon as we can get accurate evaluations for Gemini after it releases. The only situation in which this market should make it to its end date is if Gemini is not released to external evaluators by EOY 2024.

GPT-4's reference results will be the GPT-4 API at the time of Gemini evaluation (i.e. same month). If GPT-4.5 releases, this will not be considered.

Update 2025-02-01 (PST) (AI summary of creator comment): - Resolution: YES based on Gemini 1.5 Pro 002 vs GPT-4o
- Benchmarks: Google Developers Blog