In Feb 2022, Paul Christiano wrote: Eliezer and I publicly stated some predictions about AI performance on the IMO by 2025.... My final prediction (after significantly revising my guesses after looking up IMO questions and medal thresholds) was:
I'd put 4% on "For the 2022, 2023, 2024, or 2025 IMO an AI built before the IMO is able to solve the single hardest problem" where "hardest problem" = "usually problem #6, but use problem #3 instead if either: (i) problem 6 is geo or (ii) problem 3 is combinatorics and problem 6 is algebra." (Would prefer just pick the hardest problem after seeing the test but seems better to commit to a procedure.)
Maybe I'll go 8% on "gets gold" instead of "solves hardest problem."
Eliezer spent less time revising his prediction, but said (earlier in the discussion):
My probability is at least 16% [on the IMO grand challenge falling], though I'd have to think more and Look into Things, and maybe ask for such sad little metrics as are available before I was confident saying how much more. Paul?
EDIT: I see they want to demand that the AI be open-sourced publicly before the first day of the IMO, which unfortunately sounds like the sort of foolish little real-world obstacle which can prevent a proposition like this from being judged true even where the technical capability exists. I'll stand by a >16% probability of the technical capability existing by end of 2025
So I think we have Paul at <8%, Eliezer at >16% for AI made before the IMO is able to get a gold (under time controls etc. of grand challenge) in one of 2022-2025.
Resolves to YES if either Eliezer or Paul acknowledge that an AI has succeeded at this task.
Related market: https://manifold.markets/MatthewBarnett/will-a-machine-learning-model-score-f0d93ee0119b
Update: As noted by Paul, the qualifying years for IMO completion are 2023, 2024, and 2025.
Update 2024-06-21: Description formatting
Update 2024-07-25: Changed title from "by 2025" to "by the end of 2025" for clarity
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ206,978 | |
2 | Ṁ174,649 | |
3 | Ṁ118,001 | |
4 | Ṁ116,041 | |
5 | Ṁ93,057 |
People are also trading
@FlorisvanDoorn : thanks i searched and hadnt seen it.
for others here is the link to that comment: https://manifold.markets/Austin/will-an-ai-get-gold-on-any-internat#8pzm97qiu5g
@BoltonBailey The solutions are here https://storage.googleapis.com/deepmind-media/gemini/IMO_2025.pdf They're written in cleaner language than the OAI solutions and have been officially validated by the IMO organizers.
@AdamK I was skimming that myself, I felt that the proof that "a_n is even for all n" in Q4 could have been more succinct, but I guess it's not wrong.
@dgga what the fuck just happened? I opened this market on the iPhone app, it lagged out for about 30 seconds and apparently sold my entire position???
i would like to thank eliezer yudkowsky for teaching me how to think better
my mom for always believing i could generate excessive signal capture from both structured and unstructured data
the pnl owners at [redacted] for letting a young boy with a dream drag their carry into the ground
bayesian and joshua for teaching me about being brave about my sexuality
Has anyone said how long their solutions took? Very impressive regardless
@TylerMurphy Greg Brockman: "Model operated in natural language (i.e. outputs natural language proofs) under the same rules as humans (e.g. 4.5 hours per session, no tools)."
https://x.com/gdb/status/1946479692485431465
@RyanandPickles thanks for the nudge; per the market description, I will wait for at least one of @PaulChristiano and @EliezerYudkowsky (ideally, both) to confirm.
@Austin I think Paul has more exact criteria in mind, and I have yet to see him behave dishonorably about this sort of thing, so I'm unlikely to try calling it myself in advance of Paul if Paul is on-duty here. I also expect that he/we are waiting on some additional clarifications and for some further details to come out, rather than either of us wanting to be hasty.
(Also note for the record that Paul was much lower on AI solving the single hardest problem, which OAI did not afaik. Of course almost no humans did so either.)
@Austin For the record, I concur (especially given the GDM announcement, which I trust a lot more than OAI, and which seems less wobbly around the details).
@Blocksterpen3 I can present you some other markets where the market is pessimistic. Do you want to bet against me that those will resolve YES as well?
@Bayesian I might be interested in "pessimistic" medium to longer term AI markets with clear criteria.
[deleted by author]
@Jcd51 I expect OpenAI has enough brilliant mathematicians as a team they could have provided the answers within a day of the problems being available.
This is a weird thing to be suspicious of
@wrhall I'm just suspicious of the frequent benchmark gaming from OAI. But I'm really impressed that you managed to make this response in the 5 minutes I had this comment up.