A machine program or model solves 100% of the ARC evaluation questions without it having any contact with those questions prior to the first and only trial to answer. The set of questions must include all the 400 public evaluation questions and either at least 50 private evaluation questions available in something like the ARCathon (https://lab42.global/arcathon/) or at least 50 questions of similar kind and difficulty created after the program. Must be achieved by the end of 2024 (31st December). Questions and details about ARC here: https://github.com/fchollet/ARC
Market with the same resolution criteria but timeline is before 2028:
https://manifold.markets/MGM/ai-solves-the-abstraction-and-reaso-6312f0f1cbc1?r=TUdN
Apr 5, 10:06am: AI solves the Abstraction and Reasoning Corpus (ARC) by François Chollet by 2025 → AI solves the Abstraction and Reasoning Corpus (ARC) by 2025
Not clear to me that 100% is achievable. Usually benchmarks suffer from some degree of under-specification/error. I've created an 85% (human-level) version here https://manifold.markets/JacobPfau/will-the-arcagi-grand-prize-be-clai
From the paper: 'ARC comprises a training set and an evaluation set. The training set features 400 tasks, while the evaluation set features 600 tasks. The evaluation set is further split into a public evaluation set (400 tasks) and a private evaluation set (200 tasks).'
The GitHub repository contains only the public evaluation set. Does you question include the 200 private tasks?
@Allocatress Thank you for the question, there was indeed a need for clarification. I have updated the criteria to include not only the public evaluation set (400 questions) but 50 private questions (either official or not). The private questions are definitely needed to prevent hardcoded solutions.
@MGM Could actually be more than 50 private questions, but still needs to solve 100 percent of them at the first try.