This market resolves according to the score of the submission that receives the Top Score Prize ($75k) in 2025.
https://arcprize.org/blog/announcing-arc-agi-2-and-arc-prize-2025
See also:
@Bayesian Chollet says most humans can still solve them in under five minutes and at most two attempts. I've done several of the ones in the public set myself, and my score is above 90% by that standard.
But they kept some of the puzzles from ARC AGI 1 and eliminated the ones that had been easily solved, so what's left is mostly much harder for AIs.
Some more details on what "human performance" for these problems looks like:
https://x.com/fchollet/status/1904273411897168198
I tried several problems from the harder public set, and I was able to solve them all in five minutes. So I think they're mostly straightforward, though some are a bit tedious to do by hand.
@bens didn't finish my thought:
The prize-winning entry I think has to be open-source and only spend $50 in compute? Whereas there may be top models that ARC reports as doing well (O4 or whatever) that might score much higher.
@bens the Top Score Prize seems to be given to the top submission that is opensource and spends only $50 in compute so I think they match? I could be wrong lemme see
@Bayesian ok so this market resolves to whether the score meeting those criteria (open-source and $50) reaches 50%, not whether ANY AI lab can get 50% under ANY conditions?
@bens If I understand right that the Top Score Prize is given to the best opensource model under the $50 compute requirement, then yes that's correct. If I misunderstand that and the Top Score Prize is given to some model under a different set of requirements then that would not be correct possibly