I'll use the verified leaderboard at https://arcprize.org/leaderboard as the authoritative source of scores - claims that aren't officially verified don't count for this market.
Update 2026-04-14 (PST) (AI summary of creator comment): The ARC-AGI-3 scoring rules have changed since this market was created. The creator will resolve based on the new (more generous) scoring rules as reflected on the public leaderboard, since the original scoring data is not publicly available. Key changes to scoring:
Per-level baseline now uses median human player (previously 2nd-best player)
Per-level score cap increased from 100% to 115%
Net effect: scores increase by ~+0.5pp for both humans and AI
There have been some changes to the scoring:
Based on what we’ve observed, we’re announcing two updates to ARC-AGI-3 scoring:
The per-level baseline is now less sensitive to outlier performances, reducing the impact of luck on individual levels.
A single unusually efficient human run no longer defines the baseline for ARC-AGI-3 scoring. Rather the baseline now reflects more typical human play. Technical change: the human baseline which normalizes scores moves from 2nd-best player to median player per level.A single subpar level no longer disproportionately drags down an overall score
A test taker who generalizes well across an entire environment is no longer penalized by a single, sub-par, level. Technical change: per-level score cap increases from 100% to 115%.The net result of these changes is a marginal increase in scores for both humans and AI (+0.5pp) and better reflects our desire to fairly compare efficiency between test taskers.
https://arcprize.org/blog/arc-agi-3-human-dataset
I'm pretty annoyed by this. The literal reading of this market's title and description says that it resolves based on the public leaderboard, but what the public leaderboard means has changed since the market was created, and people have made trades based on the original meaning. If it were straightforward, I think I'd resolve this based on the original scoring rules and add a note. But AFAICT there's no way to see what scores would be under the original rules (they don't publish the results on private environments), so I guess we're going with the new (more generous) scoring rules now.

