Will Gemini outperform GPT-4 at mathematical theorem-proving?
15
150
310
2025
56%
chance

Based on speculation from https://youtu.be/tkqD9W5U9F4?t=468

To operationalize this, this question will resolve based on the LeanDojo benchmark (https://leandojo.org/), in particular the Pass@1 metric, where "The prover is given only one attempt and must find the proof within a wall time limit of 10 minutes."

GPT-4 is reported to achieve an accuracy of 28.8% on the "random" split of the test data in Table 2 of the LeanDojo paper (https://arxiv.org/pdf/2306.15626.pdf).

This question closes when an evaluation of Gemini's performance on this task is brought to my attention.

Get Ṁ200 play money
Sort by:

Which Gemini version?