Skip to main content
MANIFOLD
In 2029, will any AI be able to take an arbitrary proof in the mathematical literature and translate it into a form suitable for symbolic verification? (Gary Marcus benchmark #5)
104
Ṁ1.2kṀ19k
2030
81%
chance

The fifth question from this post: https://garymarcus.substack.com/p/dear-elon-musk-here-are-five-things

The full text is: "In 2029, AI will not be able to take arbitrary proofs from the mathematical literature written in natural language and convert them into a symbolic form suitable for symbolic verification."

Judgment will be by me, not Gary Marcus.

Ambiguous whether this means start or end of 2029, so I have set it for the end.

I will accept if there is an AI that can do this for >=75% of the proofs published on the arxiv math categories in 2029.

Market context
Get
Ṁ1,000
to start trading!
Sort by:

How is this not at 100%? The best mathematicians in the world are currently using commercially available LLMs to do exactly this. Here is a video of Terrence Tao doing this exact thing: https://www.youtube.com/watch?v=JHEO7cplfk8, here is another: https://xenaproject.wordpress.com/2025/12/05/formalization-of-erdos-problems/. Here is a paper about it: https://arxiv.org/pdf/2604.03071. This should be resolved today.

@JamesUul2R as far as I understand they can do it for fairly simple proofs and with a lot of handholding. Still quite a ways to go to before they can handle "an arbitrary proof in the literature"

@AhronMaline Human beings are not capable of translating every proof in the literature into Lean. For example, the ABC conjecture. If this is to be interpreted as requiring AI to literally convert every single proof from the literature into Lean then this is a super-intelligence test not a AGI test. I can see someone arguing that today they make enough errors to not meet the highest reasonable interpretation of this standard. But I do think if you had described today's capabilities to Marcus in 2022 he would have said they would pass the test. He probably wouldn't admit it today. But, by 2029? I don't think even he will be able to deny it.

@JamesUul2R since nobody outside a small group has managed to understand the alleged proof of ABC, I don't think that one counts as "in the literature". But in general I think it's true that only experts in the appropriate subfield can formalize a typical serious proof. So yeah, this benchmar is indeed very high. That's why it's "only" at 81%!

opened a Ṁ333 YES at 60% order

This is almost certainly YES in 2029 for proofs published in 2025, but there remains some uncertainty to me regarding whether the nature of proofs will change significantly by 2029... I suppose this should be at least 10%

I'm somewhat confused why the recent news of partial automatization haven't moved this market.

https://x.com/mathematics_inc/status/1966194751847461309

Do you require the translation to be true to the original or just any valid proof? Also, does it still count if a human needs to be involved in translating the theorem statement?

True to the original, has to translate the statement as well.

predictedYES

@Aleph I also still endorse my comment there about there being notable paths to improving Lean (or Coq, etc.) code generation.

What do you mean by arbitrary proofs? Does this mean all proofs? Only most of them? If something could formalize most routine proofs but fails at complicated ones would that count? What if it fails for only especially complex proofs?

@TomBouley "I will accept if there is an AI that can do this for >=75% of the proofs published on the arxiv math categories in 2029."

If I could short one person it would be this guy (if two, Nate Silver…)

Would an average math PhD be able to take an arbitrary proof in the mathematical literature and translate it into a form suitable for symbolic verification? (Answer: No)

@SG No, definitely not. I certainly couldn't. But Marcus isn't claiming to be capturing human capabilities exactly so I think this isn't "unfair", and it's an interesting question anyway