Will there be a date X when the public best AI forecasting techniques (open-sourced model weights and any scaffolding; training details not necessary as long as weights are open) score worse than 75th percentile Metaculus forecasters across almost all categories of 1-3 month resolution questions, as well as a date before X+12 months when I believe that the best open-source AI forecasting techniques are on par with or better than 98th percentile Metaculus forecasters in a wide variety of 1-3 month resolution question categories.
I'll make my resolution decision at least a year after the first commenter who credibly and in good faith proposes that date X+12 months has been reached, and at least a year after any system that ultimately resolves this question becomes public. I intend to decide whether X and X+12 meet the above requirements according to the long term performance of the models on several rounds of real forecasting tournaments. I also intend to consider the models' performances on a slate of forecasting benchmarks like Autocast (https://arxiv.org/abs/2206.15474) against any comparable human data.
If I believe AI forecasting methods no longer perform worse than 75th percentile Metaculus forecasters at some date Y, and by Y+24 months I determine that the best methods at Y+12 months were not broadly superhuman, I will resolve NO. Finally, I will resolve NO no matter what on 1/1/2034.
If AGI is considered the last invention, I fancy that this question is the last forecast where I wouldn't largely defer to an AI.