Inspired by this post.
This question refers to the performance of AI that meet the criteria of Austin's 2025 market, but for the 2026 IMO (i.e. the AI must be finalized before the IMO and gets 4.5 hours for each segment of the test, etc.).
This question also refers to Evan Chen's "Math Olympiad Hardness Scale". For each difficulty level on the scale, we ask "will AI be able to solve problems at this level".
After the test, I will wait until Evan Chen rates the problems, and I will leave a month after the closing ceremony for potential labs to announce their results. I will then resolve as follows.
For any difficulty level that does not appear on the exam, I will resolve N/A
For any difficulty level with exactly one problem on the exam, I will resolve YES if at least one AI solves it and NO otherwise.
For any difficulty level with multiple problems on the exam at that level, I will resolve as % to the fraction of problems at that level solved by at least one AI (even if those AI are different).
So if three questions are rated hard, and Google solves one and OpenAI solves a different one, and the third isn't solved, that resolves to 67%.