What IMO Shortlist topic (ACGN) will be the easiest for the first AI solving multiple shortlist problems?

1kṀ3810

2029

ALL

0.9%

Algebra

Combinatorics

98.9%

Geometry

0.1%

Number Theory

Every year, the International Mathematical Olympiad (IMO) problems are selected by the IMO Jury from the IMO Shortlist. This is a list of ~32 problems divided over 4 topics: Algebra, Combinatorics, Geometry, and Number Theory. The shortlist for year N is typically released immediately after the subsequent IMO (year N+1).

Resolves the first time when:

An AI is evaluated on the IMO Shortlist for year N, as documented in a paper or blog post that is broadly recognized as correct and without test-data leaks by the academic community.
The AI produces correct solutions to at least five (5) problems from the shortlist, or at three (3) problems in a single topic.
The solutions are humanly verifiable, and take the problem in text form as input. There are no restrictions on how the AI solves the task; in particular, calculator use is allowed. However, there should be no human in the loop.
If the answer turns out to be "clear" (e.g. there is a publicly available AI which is widely used to solve olympiad-level algebra problems), but the condition in the first bullet point above is not met, I will accept a sufficiently trustworthy open-source repository demonstrating the condition in the second bullet point.

Resolves to:

The topic where the (number of problems "essentially correct" (would get 6 or 7 points)) / (number of problems on the shortlist) is the highest.
If multiple topics have the exact same ratio above, it resolves to a split, e.g. 50%-50%; or N/A if not possible to do so.

If there are widely shared doubts about data leakage, I plan to ask at least one prominent data leakage expert for an opinion before resolving.

If multiple shortlists are part of the test set in the first published work that meets the resolution criteria, the latest one counts for the purposes of this question.
If the IMO Shortlist stops being created and published on a yearly basis, or changes so that it is no longer a set of 26<=X<=40 problems split into the four topics above, this resolves N/A.
If the resolution criteria are not met by 31 Dec 2029, this resolves N/A.

Technical AI Timelines

Get

1,000

to start trading!

3 Comments

18 Holders

37 Trades

Sort by:

take the problem in text form as input

This is basically the only reason why this hasn't resolved so far. As it currently stands, it resolves Geometry as soon as a system that reliably parses geometry questions into the input format of AlphaGeometry or one of its derivatives is brought to my attention.

Made a new question excluding geometry: https://manifold.markets/dp/what-imo-shortlist-topic-acn-will-b

The input format in https://github.com/google-deepmind/alphageometry/blob/main/imo_ag_30.txt is sufficiently close to text; and GPT-4 can convert the solutions to text. The reported training setup ensures data leakage is impossible.

The wording of this question implies that past years N are fine if there are no data leakage concerns; I'm not sure what I intended when writing the question.

In case past years count, this will very likely resolve YES once anyone manages to test https://github.com/google-deepmind/alphageometry the Geometry shortlist for the IMO 2000 or 2015, and gets at least one problem other than the ones chosen for the IMO correct.

In case past years don't count, the question does say:
> If the answer turns out to be "clear" (e.g. there is a publicly available AI which is widely used to solve olympiad-level algebra problems), but the condition in the first bullet point above is not met, I will accept a sufficiently trustworthy open-source repository demonstrating the condition in the second bullet point.

Hence, this resolves once I get informed of any sort of extensive successful testing on IMO-like problems using AlphaGeometry. The problems at https://github.com/google-deepmind/alphageometry/blob/main/jgex_ag_231.txt do not count: there are no IMO Shortlist problems there, unless I'm mistaken.