In Feb 2022, Paul Christiano wrote: Eliezer and I publicly stated some predictions about AI performance on the IMO by 2025.... My final prediction (after significantly revising my guesses after looking up IMO questions and medal thresholds) was:
I'd put 4% on "For the 2022, 2023, 2024, or 2025 IMO an AI built before the IMO is able to solve the single hardest problem" where "hardest problem" = "usually problem #6, but use problem #3 instead if either: (i) problem 6 is geo or (ii) problem 3 is combinatorics and problem 6 is algebra." (Would prefer just pick the hardest problem after seeing the test but seems better to commit to a procedure.)
Maybe I'll go 8% on "gets gold" instead of "solves hardest problem."
Eliezer spent less time revising his prediction, but said (earlier in the discussion):
My probability is at least 16% [on the IMO grand challenge falling], though I'd have to think more and Look into Things, and maybe ask for such sad little metrics as are available before I was confident saying how much more. Paul?
EDIT: I see they want to demand that the AI be open-sourced publicly before the first day of the IMO, which unfortunately sounds like the sort of foolish little real-world obstacle which can prevent a proposition like this from being judged true even where the technical capability exists. I'll stand by a >16% probability of the technical capability existing by end of 2025
So I think we have Paul at <8%, Eliezer at >16% for AI made before the IMO is able to get a gold (under time controls etc. of grand challenge) in one of 2022-2025.
Resolves to YES if either Eliezer or Paul acknowledge that an AI has succeeded at this task.
Related market: https://manifold.markets/MatthewBarnett/will-a-machine-learning-model-score-f0d93ee0119b
Update: As noted by Paul, the qualifying years for IMO completion are 2023, 2024, and 2025.
Update 2024-06-21: Description formatting
Update 2024-07-25: Changed title from "by 2025" to "by the end of 2025" for clarity
https://garymarcus.substack.com/p/alphageometry2-impressive-accomplishment
Impressive indeed, however does this count for this market if the problem are translated before being solved by the AI, and the result is not clearly human readable.
@Zardoru Paul said that formal proofs count. https://manifold.markets/Austin/will-an-ai-get-gold-on-any-internat#QuXuluE8CGMjalzyqOx4
Why there is so much progress this year (and it is just the beginning): https://benjamintodd.substack.com/p/teaching-ai-to-reason-this-years
DeepMind claims its AI performs better than International Mathematical Olympiad gold medalists
https://techcrunch.com/2025/02/07/deepmind-claims-its-ai-performs-better-than-international-mathematical-olympiad-gold-medalists/?guccounter=1
https://arxiv.org/pdf/2502.03544
Alphageom 2 solved 84% of the International Math Olympiad (IMO) problems from 2000-24 (update: this is only geometry problems)
@travelling_salesman oh yeah but that's just geometry, right? It's still really cool tho!
AlphaGeometry2 solves 42 out of 50 of all 2000-2024 IMO geometry problems, thus surpassing an average gold medallist for the first time
@jim it was already barely close to gold last time. So I guess this improvement and the recent push on o1 style reasoning should be enough. I'll be pleasently surprised if it didn't.

@jim Ive heard geom is easy compared to eg combinatorics, which is the real beast that ais couldnt even come close to solving last year
@travelling_salesman Note, this was announced back in July. See https://x.com/GoogleDeepMind/status/1816498082860667086
@JeremiahEngland ~they published an updated version recently. Confusingly it's also named AlphaGeometry2~
Update: this was the same system used in the 2024 IMO silver. Which apparently gets gold in geometry problems. I guess they didn't release a separate paper on AG2 earlier.
https://arxiv.org/abs/2502.03544

Why isn't the market updating to the recent progress in efficiency of deepseek style math improvements
@Bayesian Hmm, the intent behind the lenient "publicly accessible" is to include things like o3-mini external safety testing that happened in mid-Jan, but exclude things like Alphaproof (which AFAIK is just an internal Google thing?).
Sorry, I wasn't really clear enough. The intent/spirit of the market is to minimize the amount of information that could conceivably flow from the IMO to the model that gets tested on it. (Maybe I should have asked: if AI gets gold on IMO 2025, will its solutions be generated on July 16 and July 17? That seemed a bit less elegant to me, but now—)