AlphaGeometry (with a little help from GPT-4 to get readable solutions) essentially solved https://manifold.markets/dp/what-imo-shortlist-topic-acgn-will-2a854894253f. What is next for AI and the IMO?
Every year, the International Mathematical Olympiad (IMO) problems are selected by the IMO Jury from the IMO Shortlist. This is a list of ~32 problems divided over 4 topics: Algebra, Combinatorics, Geometry, and Number Theory. The shortlist for year N is typically released immediately after the subsequent IMO (year N+1).
Resolves the first time when:
An AI is evaluated on the IMO Shortlist for year N, as documented in a paper or blog post that is broadly recognized as correct and without test-data leaks by the academic community.
The AI produces correct solutions to at least five (5) problems from the shortlist, or at three (3) problems in a single non-Geometry topic.
The solutions are humanly verifiable, and take the problem in text form (that I can understand) as input. There are no restrictions on how the AI solves the task; in particular, calculator use is allowed. However, there should be no human in the loop.
If the answer turns out to be "clear" (e.g. there is a publicly available AI which is widely used to solve olympiad-level algebra problems), but the condition in the first bullet point above is not met, I will accept a sufficiently trustworthy open-source repository demonstrating the condition in the second bullet point.
Resolves to:
The topic where the (number of problems "essentially correct" (would get 6 or 7 points)) / (number of problems on the shortlist) is the highest.
If multiple topics have the exact same ratio above, it resolves to a split, e.g. 50%-50%; or N/A if not possible to do so.
If there are widely shared doubts about data leakage, I plan to ask at least one prominent data leakage expert for an opinion before resolving.
The resolution criteria can be met on a shortlist for a past year N<=2023, but only if the possibility of data leakage is excluded with very high certainty.
If multiple shortlists are part of the test set in the first published work that meets the resolution criteria, the latest one counts for the purposes of this question.
If the IMO Shortlist stops being created and published on a yearly basis, or changes so that it is no longer a set of 26<=X<=40 problems split into the four topics above, this resolves N/A.
If the resolution criteria are not met by 31 Dec 2029, this resolves N/A.