This question resolves when an AI first gets IMO Gold. It resolves YES if this AI is explicitly coded to do the following:
Receive as input (or construct using natural language processing), problems which take the form of mathematical propositions encoded in a formal language.
Construct a tree of logical transformations of that initial proposition (e.g. in the form of tactic applications), and iteratively extend this tree with the goal of identifying a subtree which comprises a complete formal proof of the proposition.
Note that an AI which is not coded to do this explicitly, but instead identifies this strategy emergently does not count.
For example, an AI as described in the paper "HyperTree Proof Search for Neural Theorem Proving" would satisfy these conditions.
Close date updated to 2030-12-31 3:59 pm
what if it's a model which operates in the same style as o1? Where it runs a single, long, chain of thought (and can backtrack / plan in-context) instead of constructing an explicit tree?
I take it "Note that an AI which is not coded to do this explicitly, but instead identifies this strategy emergently does not count" excludes this class of model, right?
@jonsimon It seems to me that an IMO bot that worked this way would definitely count as YES, per the criteria. Is there something I'm missing about AlphaGeometry that makes you think it doesn't count here?
To explicitly lay this out
Receive as input (or construct using natural language processing), problems which take take the form of mathematical propositions encoded in a formal language.
This seems pretty much like what is described in this figure:
Construct a tree of logical transformations of that initial proposition (e.g. in the form of tactic applications), and iteratively extend this tree with the goal of identifying a subtree which comprises a complete formal proof of the proposition.
It seems like this matches the "symbolic deduction through infinite branching points".
Seems likely, proofs require a lot of multi-step organization and planning. So far the approaches that have worked best for these have used RL-guided monte carlo tree searches (e.g. AlphaZero, AlphaTensor, etc).