Will Gemini outperform GPT-4 at mathematical theorem-proving?

Question

Based on speculation from https://youtu.be/tkqD9W5U9F4?t=468

To operationalize this, this question will resolve based on the LeanDojo benchmark (https://leandojo.org/), in particular the Pass@1 metric, where "The prover is given only one attempt and must find the proof within a wall time limit of 10 minutes."

GPT-4 is reported to achieve an accuracy of 28.8% on the "random" split of the test data in Table 2 of the LeanDojo paper (https://arxiv.org/pdf/2306.15626.pdf).

This question closes when an evaluation of Gemini's performance on this task is brought to my attention.

Manifold Markets · Answer

Likely — Manifold Markets prediction market estimates a 62% chance (20 traders, as of Dec 12, 2024).

Related questions