The resolution criteria I'll use is:
Yes: I get evidence that I could give the best version of GPT-5 an image of a series of IMO questions, and get gold-medal level answers with minimal prompt engineering
No: I get evidence that the best version of GPT-5 scores cannot be reasonably expected to demonstrate gold-medal level performance
If there are credible reports of it scoring at a gold-medal level on the 2024, 2025, or 2026 IMO questions, I will consider this to be enough evidence to resolve this market as Yes. Given this is not my area of expertise I'll be open to suggestions in modifying this criteria, and will be clear about my intentions.
GPT-5 definition:
There is a chance that OpenAI's next flagship model will have a different name. If there are reasonable questions over whether a release is "GPT-5 with a different name" I will use a manifold poll to resolve this.