Will the first AI to get IMO gold use formal methods?

1kṀ11k

resolved Jul 21

Resolved

ALL

This question resolves when an AI first gets IMO Gold. It resolves YES if this AI uses formal methods in some way, as opposed to an informal-to-informal approach.

For example, an AI as described in the paper "HyperTree Proof Search for Neural Theorem Proving" would satisfy this condition, whereas Minerva would not.

Update 2025-07-20 (PST) (AI summary of creator comment): In the event of multiple AIs achieving IMO Gold around the same time, the 'first' will be determined by:
- A best-faith effort to determine which attempt was initiated first.
- If this is unclear, the AI whose achievement was announced first will be considered the first.

Technical AI Timelines

Math

IMO Grand Challenge

Formal Methods

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ638
2		Ṁ625
3		Ṁ434
4		Ṁ281
5		Ṁ256

People are also trading

Will an AI which achieves IMO gold medal before 2026 use MCTS during test-time?

29% chance

Will AI get BOTH gold and either bronze or silver on the IMO by end of 2025?

56% chance

What company or institution will win the XTX AIMO Prize?

Time between AI winning IMO gold medal and AI winning IMO grand challenge (see description)

Will the first AI to get IMO gold cite AlphaZero?

63% chance

Will an AI get gold on this Olympiad by the end of 2025?

If there are medium-hard to tough problems on the 2026 IMO, will AI solve them?

What years will AI win IMO bronze (or higher)?

14 Comments

27 Holders

91 Trades

Sort by:

Google DeepMind didn't use formal methods: https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/

https://xcancel.com/huexi_/status/1946479076300193935
https://github.com/aw31/openai-imo-2025-proofs/tree/main
"2/N We evaluated our models on the 2025 IMO problems under the same rules as human contestants: two 4.5 hour exam sessions, no tools or internet, reading the official problem statements, and writing natural language proofs."
"5/N Besides the result itself, I am excited about our approach: We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling."

@BoltonBailey If GDM announces a gold-medal system using formal methods between now and the time the Paul/Eliezer market closes, how does this market resolve?

@bh I will make a best-faith effort based on the announcements of OpenAI and GDM to determine which attempt was initiated first. If it is unclear, I will give OpenAI the benefit of the doubt for being first on the basis of their announcement coming first.

predictedNO

Can you specify what you mean with "AI first gets IMO Gold"? According to whom? If the answer is "the AI that completes the IMO Grand Challenge" then this market should be roughly 100%, because that challenge requires a formal proof.

predictedYES

@FlorisvanDoorn I'm going for the definition which is being used by the big 2025 market (which is admittedly a little underspecified itself). What's clearly specified about that market is that there must be a computer system built before an IMO which gets scores gold on that IMO. As for who scores the test and whether the questions are processed into a formal format before being fed to the machine is unclear - I will try to match whatever criteria that market uses. At the very least if the system is entirely NLP based and outputs an obviously correct proof for each of the questions in English, then this could resolve NO.

predictedYES

@BoltonBailey Actually, looking at what Paul Christiano has said in the comment for that market, it seems like he would accept either formal or informal inputs, so I will likely accept the same here.