Will Google Gemini perform better (text) than GPT-4?
Dec 31

"perform better" refers to the text performance only, to keep it simple. To be comparable the performance should be equal or extremely close across a wide range of benchmarks (e.g. MMLU, HumanEval, WinoGrande) and chat/agent tests (e.g. MT-Bench). It should also have at least 8k context length (chosen since GPT-4 has 8k and 32k context length versions).

Of course, to qualify as YES, the group that develops a competitor must publicly announce that they trained an LLM with the benchmark results, or make an API available to external evaluators. If Gemini is released exclusively through a chat interface and the only benchmarks are internal to Google, then this market will resolve N/A because of a lack of sufficient information.

Market will resolve as soon as we can get accurate evaluations for Gemini after it releases. The only situation in which this market should make it to its end date is if Gemini is not released to external evaluators by EOY 2024.

GPT-4's reference results will be the GPT-4 API at the time of Gemini evaluation (i.e. same month). If GPT-4.5 releases, this will not be considered.

Get Ṁ600 play money
Sort by:
predicts YES

How are you going to handle the multi size release?

If we go by their word,ultra beats gpt-4,but it isn't publicly available....

predicts NO

@array_wake I suggest wait for ultra

More related questions