Related market: https://manifold.markets/Austin/will-an-ai-get-gold-on-any-internat
Nov 14, 8:20am: Will an AI get bronze or silver on any International Math Olympiad by 2025? → Will an AI get bronze or silver on any International Math Olympiad by end of 2025?
Related questions
I really don't think this qualifies. The "translations" to Lean do some pretty substantial work on behalf of the model. For example, in the theorem for problem 6, the Lean translation that the model is asked to prove includes an answer that was not given in the original IMO problem.
theorem imo_2024_p6 (IsAquaesulian : (ℚ → ℚ) → Prop) (IsAquaesulian_def : ∀ f, IsAquaesulian f ↔ ∀ x y, f (x + f y) = f x + y ∨ f (f x + y) = x + f y) : IsLeast {(c : ℤ) | ∀ f, IsAquaesulian f → {(f r + f (-r)) | (r : ℚ)}.Finite ∧ {(f r + f (-r)) | (r : ℚ)}.ncard ≤ c} 2
The model is supposed to prove that "there exists an integer c
such that for any aquaesulian function f
there are at most c
different rational numbers of the form f(r)+f(−r)
for some rational number r
, and find the smallest possible value of c"
.
The original IMO problem does not include that the smallest possible value of c
is 2, but the theorem that AlphaProof was given to solve has the number 2 right there in the theorem statement. Part of the problem is to figure out what 2 is.
Link: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/imo-2024-solutions/P6/index.html
They said they bruteforced the answer for the algebric questions, and then let the model prove it. So this does count. It's just multiple different components.
C.f. Gowers comment here, my understanding is that the formalizer network arrived at this additional information, and it was not provided by humans? https://x.com/wtgowers/status/1816840103349662114
This makes no sense to me. It basically works because the answers are guessable? Still highly skeptical. At the very least this limitation seems to preclude anything that doesn’t involve unconscionably vast compute resources, and solving any problem that does not have a cutesy toy answer that can be naively guessed (All even numbers/c=2, etc.). Certainly this should make others substantially more skeptical that this performance wasn’t just a fluke of the competition having so many answers of this type.
Where does it say the translation of the IMO problems was manual?
It says so in the blog post announcement. ctrl+f for "manual"
Does it count if manual translation to formal language is required
One of the people who formulated the bet in the linked market have said that this is fine.
how can anyone say llms are "really" intelligent when it takes them 3 days to get a silver medal performance on the IMO? it's obvious this will never be possible to be done under the 4.5 hours time limit of the competition
edit: /s (I thought it was obvious)
It didn't,
"Our systems solved one problem within minutes and took up to three days to solve the others."
To get the medal you have to solve them in 4.5 hours :)
https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/#:~:text=Our%20systems%20solved%20one%20problem%20within%20minutes%20and%20took%20up%20to%20three%20days%20to%20solve%20the%20others.
I wouldn't say so, the IMO bronze cutoff is often above 14. But there's a market if you want to bet on that
/FlorisvanDoorn/what-years-will-alphageometry-will-0e2b5f902931
Is there complexity here I'm missing? Why is this not at near 100% and I see people still buying NO?
https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/
In particular, I am looking at these lean files and wondering if the "determine" values were provided to the AI in advance, seems like an important potential distinction. Edit: the blog post seems to imply it figured them out itself.
@GeraldMonroe I'd bet they'll want to go for the ultimate glory that is Gold IMO. pretty unlikely they stop at silver. could make a market about whether Deepmind specifically solves Gold IMO 2025, and I'd bet that to within like 10-20% of anyone solving Gold IMO 2025? hmmm