Will Google Gemini do as well as GPT-4 on Sparks of AGI tasks?

EDIT: For the purposes of this market, Gemini refers to the largest variant, Gemini Ultra.

Gemini is the multimodal LLM bring trained by Google DeepMind. Sparks of AGI is a paper published by Microsoft researchers evaluating a non-public checkpoint of GPT-4 on a range of language tasks.

This will resolve Yes if Gemini matches or exceeds GPT-4 performance, on at least half of those Sparks tasks for which we find out Gemini results. It will resolve No otherwise.

Possible ways in which we get Gemini results are:

(1) Published/pre-print paper that evaluates Gemini on some or all of the Sparks tasks. The task setup and difficulty should be uncontroversially similar, but the specific instance can be different (e.g. giraffe instead of unicorn, different map layout etc.).

(2) Public comparison. Gemini should be available to many individuals, through an application or API, and the comparison should be like-for-like against ChatGPT-4/GPT-4 API, using the same prompts. No multimodal input or tools should be used, beyond the finetuned language model.

If both sets of results are available then the paper will take precedence, unless there's consensus that the public comparison shows substantially opposite conclusions.

If it becomes clear such a comparison can never be made, e.g. due to Gemini being cancelled or being designated permanently access-restricted, the market will resolve N/A.

People are also trading

People are also trading

Related questions