Center for AI Safety announced a forecasting AI that allegedly outperforms Metaculus. However, Oliver Habryka and Gwern raised concerns about data contamination. If a reliable (according to me) test is performed, will FiveThirtyNine be found to outperform Metaculus?
If all concerns with the current test will be convincingly debunked, the market will resolve YES. If a new test will involve Metaculus forecasts that contain similar AI in the aggregate, it will not be sufficient for a NO resolution. If no such test is performed by closing date, it will resolve MKT. Since what counts as a "reliable test" is a subjective judgment call, I will not trade on this market. I also welcome suggestions for better resolution criteria.
See also:
Metaculus is running a quarterly bot tournament: https://www.metaculus.com/project/aibq3/ If
this thing is so great, let it enter that tournament and see if it can beat first of all the other bots but secondly the team of human pros that they're benchmarking against.