(Metaculites created this question with resolution date 2040.)
"A team of three expert interviewers will interact with a candidate machine system (MS) and three humans (3H). The humans will be graduate students in each of physics, mathematics and computer science from one of the top 25 research universities (per some recognized list), chosen independently of the interviewers. The interviewers will electronically communicate (via text, image, spoken word, or other means) an identical series of exam questions of their choosing over a period of two hours to the MS and 3H, designed to advantage the 3H. Both MS and 3H have full access to the internet, but no party is allowed to consult additional humans, and we assume the MS is not an internet-accessible resource. The exam will be scored blindly by a disinterested third party." The experts may come up with new questions to ask while administering the test.
If such a test is passed before 2030, then this resolves Yes. If such a test is conducted on a state-of-the-art AI in 2029, and the AI fails, then this resolves No. If neither criterion is met, then this will resolve to my credence that such a test could be passed by an existing AI system. I will not participate in this market.
EDIT: no other AI systems should be consulted. Systems which use an AI instrumentally (e.g. as in Google search results) are ok, but the test adminstrator should do their best to redact direct AI content e.g. the AI-generated QA panels at the top of certain Google queries.
LLMs are already basically expert-level in answering (at least some types of) exam questions