Will an AI get gold on any International Math Olympiad by the end of 2025?
💎
Premium
2.2k
2.2M
2026
69%
chance
https://bounded-regret.ghost.io/ai-forecasting-one-year-in/ This is from June - great article on hypermind forecasts for AI progress, and how the progress on the MATH dataset 1 year in was far faster than predicted.
https://ai.facebook.com/blog/ai-math-theorem-proving/
Seems relevant https://aimoprize.com/
Retracted, possibly wrong, possibly embargo-breaking, online article saying that Deepmind systems had hit IMO silver level.
+20%
on
It's over https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/
+30%
on

In Feb 2022, Paul Christiano wrote: Eliezer and I publicly stated some predictions about AI performance on the IMO by 2025.... My final prediction (after significantly revising my guesses after looking up IMO questions and medal thresholds) was:

I'd put 4% on "For the 2022, 2023, 2024, or 2025 IMO an AI built before the IMO is able to solve the single hardest problem" where "hardest problem" = "usually problem #6, but use problem #3 instead if either: (i) problem 6 is geo or (ii) problem 3 is combinatorics and problem 6 is algebra." (Would prefer just pick the hardest problem after seeing the test but seems better to commit to a procedure.)

Maybe I'll go 8% on "gets gold" instead of "solves hardest problem."

Eliezer spent less time revising his prediction, but said (earlier in the discussion):

My probability is at least 16% [on the IMO grand challenge falling], though I'd have to think more and Look into Things, and maybe ask for such sad little metrics as are available before I was confident saying how much more.  Paul?

EDIT:  I see they want to demand that the AI be open-sourced publicly before the first day of the IMO, which unfortunately sounds like the sort of foolish little real-world obstacle which can prevent a proposition like this from being judged true even where the technical capability exists.  I'll stand by a >16% probability of the technical capability existing by end of 2025

So I think we have Paul at <8%, Eliezer at >16% for AI made before the IMO is able to get a gold (under time controls etc. of grand challenge) in one of 2022-2025.


Resolves to YES if either Eliezer or Paul acknowledge that an AI has succeeded at this task.

Related market: https://manifold.markets/MatthewBarnett/will-a-machine-learning-model-score-f0d93ee0119b


Update: As noted by Paul, the qualifying years for IMO completion are 2023, 2024, and 2025.

Update 2024-06-21: Description formatting

Update 2024-07-25: Changed title from "by 2025" to "by the end of 2025" for clarity

Get Ṁ600 play money
Sort by:

The real market, in case you want to hedge. I think since formalizing combinatorics will be difficult, the correlation is quite large.

The programs [...] were, to most people, simply "astonishing": computers were solving algebra word problems, proving theorems in geometry and learning to speak English. Few at the time would have believed that such "intelligent" behavior by machines was possible at all. Researchers expressed an intense optimism in private and in print, predicting that a fully intelligent machine would be built in less than 20 years.

The above snippet is describing the situation in 1956. Via

https://en.wikipedia.org/wiki/History_of_artificial_intelligence

Notably, the proof statements were manually translated into Lean theorems before the proofs generated with AI. If the statements still are/have to be manually translated into a formal language for the AI to get gold, will this still count?

And the translation might actually be a very hard problem, because you might need something like an LLM, but with very high reliability on even minor details...

Link to a message doesn't seem to work well for me on manifold, so here is a screenshot of that message

A small update after reviewing the questions and solutions in more detail: for the questions deepmind's AI solved, the formalization is reasonably straightforward and is IMHO likely to be achievable with an automatic system with high reliability. Problem 3 is IMHO somewhat harder to formalize and Problem 5 is very hard.

Additionally, it is unclear what was used as a criterium to obtain a "nice" answer for problems 1 and 2. Those do not ask for proofs, but characterizations and there are many mathematically equivalent ways to characterize the solution set (e.g. by trivially altering the problem statement). Although we humans nutarally see the correct solutions as "elegant" or "clear". So presumably, the AI was tasked with finding an optimal equivalent characterization in some sense (e.g. number of symbols)

So, the remaining 30%-ish is in how Paul and Eliezer rule the bet on the speed of the solve, and on the ability to solve the combinatorial problems, correct? Or am I missing something?

Lots of will deepmind get close but move on to another thing.

Basically what happened with alphastar

Right, had not even considered that, though that seems unlikely to me.

The new algo did well enough for a silver medal, not a gold. So people are giving 67% to the rest of the gap being closed by the end of 2025.

In some sense it feels like DeepMind got unlucky with only rolling one Geometry problem, rather than 2, which happens about 75% of the time in recent years. AlphaGeometry 2 has 83% success rate when backtested, which is higher than the average per-question success probability needed for typical recent gold cutoffs. Seeing that they were only a single point from the cutoff, I think there's a decent chance that if the exact same procedure were run again next year, it would get gold.

@jonsimon do we know if this is perfectly parallizable? They didn't actually meet time limits for silver even

Well, AlphaZero is a form of tree search, so I guess I would expect it to be possible to parallelize by taking the n most likely proof-prefixes found after running for a while and farming them out to separate servers.

It's possible it's more complicated. One thing that would confuse me about that possibility is: Why didn't they just use 16 times as many servers so that they would come in under the time limit? Did they use all the resources they had when running the competition? Why didn't they rent more? Did they realize how close they were?

Another way in which they got unlucky was that the gold cutoff was of the form 7n+1, so that the number of problems they would have needed to solve lined up badly against the difficulty of those problems.

@BoltonBailey Search is in my experience actually very hard to paralellize and naive approaches don't work very well, because all efficient search algorithms rely on shared global information (hash tables, bounds, ...) and effort per branch tends to be highly uneven.

What the past 10 years should have taught everyone is never bet against AI.

Betting 'yes' on this market for a long time was a loosing strategy

@Mikhail wdym? It payed off

Sure, but even when loans existed they got cut off because of negative net worth. Doesn't look like winning to me

Betting yes on this market yesterday is a winning strategy :)

If you bet yes a year ago, you doubled your mana in a year. That's a great ROI, even without including loans which makes it even better.

The only bad part is if you bet everything you have (or more than everything including loans) and needed liquidity earlier. Just know the risks of concentrated positions in illiquid investments

bought Ṁ250 YES

@DottedCalculator Honestly yeah. I'm just hoping that it'll resolve or reach like 90 before the election so I can sell and buy more Trump YES.

Wish I had bet more when it was at 25% but even I wasn't expecting AI to progress this fast. Truly, I should not have bet against AI (taking until Q4)

I guess it satisfies the criteria, but it’s a little disappointing that the thing that might succeed here isn’t a general reasoning AI like an LLM. Still very impressive of course, but not the huge jump in general reasoning ability that we would have if eg GPT-5 was able to do it

1 point off to the gold + runtime should improve but it's very probable