1.7K
18K
7.6K
2026
20%
chance

An opportunity to join in on https://www.lesswrong.com/posts/sWLLdG6DWJEy3CH7n/imo-challenge-bet-with-eliezer Eliezer has this at >16%; Paul at <8%. Resolves to YES if either Eliezer or Paul acknowledge that an AI has succeeded at this task.

Related market: https://manifold.markets/MatthewBarnett/will-a-machine-learning-model-score-f0d93ee0119b

Update: As noted by Paul, the qualifying years for IMO completion are 2023, 2024, and 2025.

Get Ṁ1,000 play money
Sort by:
opened a Ṁ100 NO at 20% order

I wonder why @KatjaGrace 's 19% order isn't shown in Trades, even though my own earlier and later trade is.

oops i should read the description before i comment

bought Ṁ120 NO from 20% to 19%

even if there are ai’s to do algebra, geometry, and number theory, the probability of this is close to sqrt(limit from x to 1 of (x^2-1)/(x+1))

bought Ṁ10 YES at 20%

<insert ai model> on it’s way to fakesolve imo p3 and p6 and then claim it got a perfect score:

"Fields Medallists Timothy Gowers and Terence Tao appointed to the Advisory Committee for the Artificial Intelligence Mathematical Olympiad Prize, alongside Po-Shen Loh, Dan Roberts and Geoff Smith."

https://x.com/aimoprize/status/1755204763673620703

bought Ṁ5,000 NO

As an ex-math olympiad competitor (never went to IMO but probably around low bronze level), and current computer science student at uni, I can pretty comfortably just use the NO market as a Mana bank here.

3 traders bought Ṁ80 YES

Looks like it's been 19 days since you were last surprised by AI-IMO progress?

@placebo_username Not to discredit Alpha Geometry, but I do feel like the way Olympiad geometry problems are set up does not require much out-of-the-box thinking but rather just systematic construction and theorem-applying. I would be way more surprised if say a model can find a never-before-seen, intricate setup that is required to solve a combinatoric problem at IMO Gold level, which I believe is still way further than what all the hype around it seems to portray. One similar example is how AlphaCode, which is developed to solve competitive programming problems, that have considerable overlap with combinatorics in terms of thought processes is only able to outperform 50% of all Codeforces Users, even with multiple attempts allowed on each problem, which would roughly convert to a bottom 5% result in the IOI, added with the difficulty in having the AI rigorously construct proofs and also only have one attempt on each problem, I believe there is still a very big gap.

3 traders bought Ṁ1,600 NO

@Kevinxiehk AlphaCode 2 came out in December, with performance at the candidate master level, which is better than 90% of codeforces users

You're not up to date

@colorednoise Apologies, you are right. Upon reading the technical report of AlphaCode 2, it's model is still based on generating a million candidate programs then test them with sample testcases without rigourous reasoning, which I doubt works with maths olympiad problems. Also to quote from the report "In contrast, competitive programming problems are open-ended. To solve them, before writing the code implementation one needs to understand, analyze and reason about the problem, which involves advanced mathematics and computer science notions.", where similar arguments could be applied to combinatorial maths olynpiad problem as well.

I'll still have to admit that reaching Candidate Master level is an impressive feat, which took me 2 years back in the days to do so.

@Kevinxiehk solve 2020 imo p6 without using out of the box thinking

@Kevinxiehk You give some plausible inside-view arguments for why AI-IMO might be hard, but there are also comparably-plausible outside-view counterarguments (i.e. just look at this recent history of AI progress). I have a hard time seeing how this adds up to "comfortably use the NO side as a mana bank" levels of certainty for you.

@placebo_username Would you say that betting no on AGI by 2025 is not a mana bank too?

predicts YES

DeepMind solved 25 out of 30 questions—compared to 26 for a human gold medalist.

https://arstechnica.com/ai/2024/01/deepmind-ai-rivals-the-worlds-smartest-high-schoolers-at-geometry/

bought Ṁ50 of NO

@breck That is only on geometry, which is arguably the one among ACGN which needs the least ad-hoc construction and creative problem-solving techniques.

As a saying goes among math olympiad people: If you draw a circle about every triple of points you can solve any geometry problem

predicts NO

@Kevinxiehk True, but this is still quite impressive.
I should admit that if there was a market about this specific thing, my prediction would have been quite wrong.

predicts NO

@dionisos Definitely, my same reaction when I saw the result. Hopefully theres still a long way before similar things can be done for other branchss of olympiad maths

sold Ṁ78 of YES

@breck does the AI have to be admitted/allowed in to an IMO competition, or just take such a test and get gold?

predicts NO

@VAPOR AIs are not eligible to compete in IMO. The question is about whether when an AI is tested under conditions equivalent to the competition, it will score sufficient points to get gold.

@Kevinxiehk It's funny how there keep being breakthroughs that you believed impossible and still find ways to rationalize that your original skepticism is still well placed

@Kevinxiehk as the saying goes 2021 imo p3 is definitely totally totally totally solvable by drawing a circle between every 3 points without any construction at all totally totally totally

predicts NO

Recent events seem pretty relevant to this question:

sold Ṁ419 of YES

After actually reading the AlphaGeo paper, I'm much more skeptical of this approach generalizing to other IMO subjects. The symbolic language they use has a lot of limitations, and it seems like broader methods would require much more expressive languages (where symbolic search is much, much more expensive) or would simply fail to adapt a wide variety of IMO problems.

Some aspects of the synthetic data generation process feel like they could apply to a wider class of problems, but there are still fundamental barriers. Ultimately, sampling premises + enumerating deductions + pruning to conclusions with long proofs seems like a pretty decent recipe, but my uncertain impression is that the success of this process in geometry depended on there being a relatively narrow class of geo/algebraic manipulations used to build the deduction closure graph. The class of premises in geo is also much smaller than other topics. I do think there's a lot of room for a more clever approach to dataset generation, but this may be a bottleneck.

On the other hand, this approach did not take as much advantage of NN-guided search as I think future methods will. The NN suggests one construction at a time before handing off most of the work to the symbolic solver. It seems like using NNs for more granular search heuristics still has a lot of potential (though I'd love to hear more considerations/discussion on this), and this work doesn't update me on how well this might work in either direction.

Takeaway: this paper's methods are not directly enough for IMO Gold (or Bronze). Expressing the breadth of IMO problems will require a much more expressive formal language which rules out this method's heavy reliance on pure symbolic search. However, the research community has its eye on techniques that might(?) alleviate this problem.

sold Ṁ129 of NO

Reading the paper, I think it's still unlikely by 2025, but not 20% unlikely

Comment hidden

More related questions