Perform as well as the best human entrants in the Putnam competition—a math contest whose questions have known solutions, but which are difficult for the best young mathematicians
I would like to hear some reasoning from the NO crowd. It's probably due to my own ignorance, but I simply cannot see how this will not resolve to YES. In 2022 we had GPT-2, barely capable of answering the most basic questions and incapable of even acting as a chatbot. 2 years later (today) DeepMind just published a paper in Nature on "AlphaGeometry" an AI that performs above the level of a Silver Metalist in the International Mathematical Olympiad (https://www.nature.com/articles/s41586-023-06747-5). If anything it seems that progress is accelerating. I simply can't wrap my mind around the idea that we won't blow past Putnam-level questions in the next 2 years, never mind 4.
My argument for: I can see a deepmind-style MCTS+coq, with a whole lot of work on the move proposal function (possibly specializing it to subfield; possibly LLM-based, though recent results have been less than stellar), succeeding once the question are formalized. Also provides an easy out of the "write up the answer" problem, assuming a coq demonstration would count as a valid. Would not require AGI, "just" very impressive but still very much narrow domain RL.
Going from the text of the question to a correctly-translated formalization seems fairly fraught, hence relatively low personal probability.