AI IMO 2025: How many AI labs announce a Gold performance at the IMO in 2025?
234
1kṀ110k
resolved Aug 1
100%93%
2
0.1%
0
0.0%
1
2%
3
2%
4
1.0%
5
0.8%
6
0.8%Other

This market resolves to N, where N is the number of distinct AI labs that have an AI system that meets ALL of these criteria in 2025:

  1. The AI system completes the official 2025 International Mathematical Olympiad problems under standard IMO time constraints (4.5 hours per 3-problem session)

  2. The system was not trained on IMO 2025 solutions (lol). This likely means the system's training was completed before the first day of the IMO 2025.

  3. Humans do not assist with the problem solving of the problems. They can however provide a formal proof language version of the problem.

  4. The system provides complete mathematical proofs (either in natural language or in formal proof languages like Lean), and the natural language proofs are judged to a similar standard as human participants.

  5. The system achieves a score that meets or exceeds the 2025 IMO Gold medal cutoff

  • Update 2025-28-01 (PST) (AI summary of creator comment): Update from creator

    • Market close date has been pushed back to allow for valid announcements within a month after the competition.

    • Announcements made more than a month after the competition will still be counted, potentially requiring the market to be reresolved.

  • Update 2025-06-18 (PST) (AI summary of creator comment): The creator has specified the following submission constraints, in response to a question about pass@1 evaluation:

    • For natural language solutions, only one solution can be submitted.

    • For formal proofs (e.g., Lean), the first valid proof will be the one considered for resolution.

  • Update 2025-07-08 (PST) (AI summary of creator comment): In response to discussion about how human IMO submissions are judged, the creator has specified a change to the submission criteria:

    • If human participants are allowed to include multiple attempts within their single submission packet, then AI systems will be judged by the same procedure.

    • This potentially modifies the previous clarification that only one solution could be submitted.

  • Update 2025-07-09 (PST) (AI summary of creator comment): In response to a discussion about submission limits, the creator has clarified the policy on multiple attempts:

    • Submitting thousands of random attempts for a human to piece together into a solution is not allowed.

    • The resolution will be guided by the spirit of the market, which is to judge performance in a way that feels fair and reasonable.

  • Update 2025-07-29 (PST) (AI summary of creator comment): In response to a discussion about the validity of an announced Gold performance by the lab Harmonic, the creator has made a preliminary judgment:

    • Based on a statement made in a public livestream, the creator believes Harmonic did not complete their work within the required time constraints.

    • As a result, Harmonic's performance will not be counted for this market's resolution.

    • This judgment could be reversed if Harmonic provides an explicit statement confirming they did meet the time limit.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ4,989
2Ṁ3,273
3Ṁ1,545
4Ṁ1,498
5Ṁ1,240
Sort by:

will reresolve if news comes out, but for now it's 2 and likely to stay that wya

@Bayesian

Shouldn't you wait to resolve to August 20th?

@Metastable yeah this feels like leagues manipulation!

@Metastable If yall think the market would be better reopened lmk but that’s an ai clarification and if you look at the comment I am saying maybe I could keep it open for a month, not that I would

@mathvc Are you saying Seed-Prover should be counted? Didn't they achieve only silver?

bought Ṁ30 YES

Harmonic's model did not reach gold performance within the time limit.

sold Ṁ52 YES

@AdamK source?

@AdamK Additionally, their solution to problem 2 seems to completely ignore configuration issues and just assume one configuration. https://github.com/harmonic-ai/IMO2025/blob/main/HarmonicLean/IMO2025P2.txt

@DottedCalculator Isn't it common practice not to dock points for configuration issues? Or is it different for analytic solutions?

Edit: It's not even analytic. Probably shouldn't be a dock then.

@vincentWang At the IMO configuration issues get docked a lot more often. The solution has a lot of ratios, which means that the configurations "actually matter" or something because you can't fix just by saying directed angles (although there exists a one sentence fix).

As an example, see my score in 2023.

sold Ṁ19 NO

@DottedCalculator I guess directed lengths exist. In any case it doesn't look like Harmonic gave the AI only the natural language problem statement, so I don't think it counts in any case. I guess it's a good thing I sold half my position on 2 NO, still hoping for other AI labs to announce it :/

Why is 2 still higher than 3? I think it's <10% chance that any of the 3 already announced golds are invalid.

Edit: watched the X broadcast at 31:03 - this makes me less confident, but it's still a bit ambiguous and there are many other AI models in other countries still out there. I'll be keeping most of my position.

sold Ṁ31 YES

unless we get a statement by harmonic explicitely saying they did it within time, I think this (the statement at 31:00 of the livestream) is strongly enough indicative that they did not complete gold medal performance within the allotted time that I will not be counting it for the purpose of this market.

bought Ṁ50 YES

Do we know whether Harmonic's model was given the answer or if it came up with the answer independently?

@DottedCalculator we don't know for sure yet but i would be very surprised if they were given the answer? what i think is more plausible is that the 5 problems were not solved in the allotted time

With a tweet like "Join us live from the birthplace of Silicon Valley @ 3 PM PT to be among the first to experience Mathematical Superintelligence", it sure would be a letdown if it didn't get gold.

bought Ṁ100 YES

part of resolution criteria is that they have to achieve gold within time limits without cheating, i could see them announce gold but they achieved it by taking more time or wtv

Not an AI lab so unclear whether relevant to this market, but a "civilian" claims to have achieved "almost gold" using just regular Gemini 2.5 Pro: https://arxiv.org/abs/2507.15855

they gave it hints

@Bayesian Yes, although the hints are claimed to be minor. I'm not saying this counts. I'm saying, this suggests this year wasn't that hard for AI

If someone gave me a hint on p6, I would be able to solve it.

opened a Ṁ50 NO at 62% order

@pietrokc The hints were "Let us try to solve the problem by induction" for problem 1 and "Let us try to solve the problem by analytic geometry" for problem 2. They claim no hints for the other problems. I think describing these hints as minor is more than fair. I agree this doesn't qualify for the market (it's clearly not a AI lab announcing gold), but a relevant piece of information nonetheless. I'm surprised this hasn't moved 2 down.

@JBSIyg2 Agreed.

I think this paper clearly reveals something about IMO that might not have been obvious from the original Google/OpenAI announcements: they're kind of like chess in that at each step there's only a few "moves" available and you can solve them by trying everything and seeing what sticks.

Anyone who's tried public models know they're abysmal at math that's wasn't in the training set. A week before IMO I was having o4 and Gemini 2.5 Pro fail to solve problems that would be too easy for AIME, state obvious falsehoods in elementary congruence arithmetic, etc. But of course I was only trying each question ~5 times since I'm just one guy and this isn't my job.

Turns out you can solve this in an "infinite monkeys" sort of way by generating 100s or 1000s of solution starts, asking the model to check them for correctness, generating 100s of continuations for each, and so on.

I freely admit I was surprised this worked for P1 and P5. (I expected it to work for P2 and P3).

It's not clear to me how they came up with the model parameters (prompt, temperature) from the paper. It's cheating (training on the test set) to optimize these on the IMO 2025 problems themselves.

@JBSIyg2 no, these are not minor hints.

@pietrokc no you can’t do this. You cannot check correctness of informal solutions.

@mathvc You are incorrect. That is exactly how it was done. Turns out LLMs can check correctness of informal proofs, for the IMO competition using IMO grading standards, reliably enough for this purpose.

@pietrokc show us your code. You make it sound like it can be done in 2 days, i will wait

@mathvc lol buddy I didn't do it myself. But I know people who did do it themselves, and I have seen various artifacts. I had been betting against this and I was also surprised, but there's no use denying reality

@pietrokc 🤦‍♂️

@mathvc I don't say any of this lightly. I have a PhD in math and I can actually solve IMO questions myself, unlike most people commenting on this topic. I have seen non-public stuff about this and what DeepMind are saying is true. They have an LLM that can verify natural language proofs (to IMO standards of rigor, which are not perfect) with high accuracy. Meaning like, if you give it a (very short) proof with a wrong or missing step, it will say the proof is wrong most of the time. And if the proof is right, it will say it is right most of the time.

In some sense it doesn't "understand" math because it will often say nonsense, but turns out this accuracy, along with generating 1000s of plausible proofs, is enough to gold IMO. (In 2025 at least, with only one combinatorics problem among P2, P3, P5, P6.)

@pietrokc this is just LLM reward model/verifier and it is not some “hidden secret” or “non-public stuff”, it was published by them many times

@mathvc I don't know what we're arguing about anymore. I was responding to your apparent skepticism:

> You cannot check correctness of informal solutions.

© Manifold Markets, Inc.TermsPrivacy