EDIT: For the purposes of this market, Gemini refers to the largest variant, Gemini Ultra.
Gemini is the multimodal LLM bring trained by Google DeepMind. Sparks of AGI is a paper published by Microsoft researchers evaluating a non-public checkpoint of GPT-4 on a range of language tasks.
This will resolve Yes if Gemini matches or exceeds GPT-4 performance, on at least half of those Sparks tasks for which we find out Gemini results. It will resolve No otherwise.
Possible ways in which we get Gemini results are:
(1) Published/pre-print paper that evaluates Gemini on some or all of the Sparks tasks. The task setup and difficulty should be uncontroversially similar, but the specific instance can be different (e.g. giraffe instead of unicorn, different map layout etc.).
(2) Public comparison. Gemini should be available to many individuals, through an application or API, and the comparison should be like-for-like against ChatGPT-4/GPT-4 API, using the same prompts. No multimodal input or tools should be used, beyond the finetuned language model.
If both sets of results are available then the paper will take precedence, unless there's consensus that the public comparison shows substantially opposite conclusions.
If it becomes clear such a comparison can never be made, e.g. due to Gemini being cancelled or being designated permanently access-restricted, the market will resolve N/A.
What do you think about it?
Personally, I utilize both approaches.
❤
This one is easy, Demis Hassabis who is no joke already threw shades at chatGPT very early in the training process of Gemini. So the only reason I am not paying more is that I want to still have something left to bet on other stuff.