Will an AI get gold on any International Math Olympiad by the end of 2025?
2.8k
100kṀ4.3m
2026
84%
chance
https://bounded-regret.ghost.io/ai-forecasting-one-year-in/ This is from June - great article on hypermind forecasts for AI progress, and how the progress on the MATH dataset 1 year in was far faster than predicted.
https://ai.facebook.com/blog/ai-math-theorem-proving/
Seems relevant https://aimoprize.com/
Retracted, possibly wrong, possibly embargo-breaking, online article saying that Deepmind systems had hit IMO silver level.
+20%
on
It's over https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/
+30%
on
https://openai.com/index/learning-to-reason-with-llms/ Looks like you don't even need specific math fine-tuning to solve math competitions, you just need non-constant compute time for LLMs (So they spend more time on hard problems)
@AdamK OK, so who's benchmarking o3-mini against the 2024 IMO? We could have results within the week.
+7%
on

In Feb 2022, Paul Christiano wrote: Eliezer and I publicly stated some predictions about AI performance on the IMO by 2025.... My final prediction (after significantly revising my guesses after looking up IMO questions and medal thresholds) was:

I'd put 4% on "For the 2022, 2023, 2024, or 2025 IMO an AI built before the IMO is able to solve the single hardest problem" where "hardest problem" = "usually problem #6, but use problem #3 instead if either: (i) problem 6 is geo or (ii) problem 3 is combinatorics and problem 6 is algebra." (Would prefer just pick the hardest problem after seeing the test but seems better to commit to a procedure.)

Maybe I'll go 8% on "gets gold" instead of "solves hardest problem."

Eliezer spent less time revising his prediction, but said (earlier in the discussion):

My probability is at least 16% [on the IMO grand challenge falling], though I'd have to think more and Look into Things, and maybe ask for such sad little metrics as are available before I was confident saying how much more.  Paul?

EDIT:  I see they want to demand that the AI be open-sourced publicly before the first day of the IMO, which unfortunately sounds like the sort of foolish little real-world obstacle which can prevent a proposition like this from being judged true even where the technical capability exists.  I'll stand by a >16% probability of the technical capability existing by end of 2025

So I think we have Paul at <8%, Eliezer at >16% for AI made before the IMO is able to get a gold (under time controls etc. of grand challenge) in one of 2022-2025.


Resolves to YES if either Eliezer or Paul acknowledge that an AI has succeeded at this task.

Related market: https://manifold.markets/MatthewBarnett/will-a-machine-learning-model-score-f0d93ee0119b


Update: As noted by Paul, the qualifying years for IMO completion are 2023, 2024, and 2025.

Update 2024-06-21: Description formatting

Update 2024-07-25: Changed title from "by 2025" to "by the end of 2025" for clarity

Get
Ṁ1,000
to start trading!
Sort by:

https://garymarcus.substack.com/p/alphageometry2-impressive-accomplishment
Impressive indeed, however does this count for this market if the problem are translated before being solved by the AI, and the result is not clearly human readable.

bought Ṁ3,000 YES

Why there is so much progress this year (and it is just the beginning): https://benjamintodd.substack.com/p/teaching-ai-to-reason-this-years

@yaqubali Only in geometry, not everything.

https://arxiv.org/pdf/2502.03544

Alphageom 2 solved 84% of the International Math Olympiad (IMO) problems from 2000-24 (update: this is only geometry problems)

@travelling_salesman oh yeah but that's just geometry, right? It's still really cool tho!

AlphaGeometry2 solves 42 out of 50 of all 2000-2024 IMO geometry problems, thus surpassing an average gold medallist for the first time

I'm interested in to what extent geometry problems were a bottleneck to IMO gold performance

@jim it was already barely close to gold last time. So I guess this improvement and the recent push on o1 style reasoning should be enough. I'll be pleasently surprised if it didn't.

@travelling_salesman where's the image from?

filled a Ṁ50 YES at 91% order

@jim Ive heard geom is easy compared to eg combinatorics, which is the real beast that ais couldnt even come close to solving last year

@JeremiahEngland ~they published an updated version recently. Confusingly it's also named AlphaGeometry2~

Update: this was the same system used in the 2024 IMO silver. Which apparently gets gold in geometry problems. I guess they didn't release a separate paper on AG2 earlier.

https://arxiv.org/abs/2502.03544

@travelling_salesman sorry my bad, they just published the result now. Sorry for the confusion 🙈

filled a Ṁ100 YES at 91% order

Why isn't the market updating to the recent progress in efficiency of deepseek style math improvements

https://x.com/tengyuma/status/1886815532524986605

@travelling_salesman It was already priced in, probably.

I'm curious about the delta between this question and the following one:

alphaproof and alphageometry aren't both publicly accessible iirc, which is one part of the differences. but that market is still underpriced

nvm maybe i'm wrong, the description explains you didn't mean what i had in mind

@Bayesian Hmm, the intent behind the lenient "publicly accessible" is to include things like o3-mini external safety testing that happened in mid-Jan, but exclude things like Alphaproof (which AFAIK is just an internal Google thing?).

Sorry, I wasn't really clear enough. The intent/spirit of the market is to minimize the amount of information that could conceivably flow from the IMO to the model that gets tested on it. (Maybe I should have asked: if AI gets gold on IMO 2025, will its solutions be generated on July 16 and July 17? That seemed a bit less elegant to me, but now—)

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules