AI solves the Abstraction and Reasoning Corpus (ARC) by 2025
Basic
29
2.6k
Dec 31
11%
chance

A machine program or model solves 100% of the ARC evaluation questions without it having any contact with those questions prior to the first and only trial to answer. The set of questions must include all the 400 public evaluation questions and either at least 50 private evaluation questions available in something like the ARCathon (https://lab42.global/arcathon/) or at least 50 questions of similar kind and difficulty created after the program. Must be achieved by the end of 2024 (31st December). Questions and details about ARC here: https://github.com/fchollet/ARC

Market with the same resolution criteria but timeline is before 2028:

https://manifold.markets/MGM/ai-solves-the-abstraction-and-reaso-6312f0f1cbc1?r=TUdN

Apr 5, 10:06am: AI solves the Abstraction and Reasoning Corpus (ARC) by François Chollet by 2025 → AI solves the Abstraction and Reasoning Corpus (ARC) by 2025

Get Ṁ600 play money
Sort by:

Not clear to me that 100% is achievable. Usually benchmarks suffer from some degree of under-specification/error. I've created an 85% (human-level) version here https://manifold.markets/JacobPfau/will-the-arcagi-grand-prize-be-clai

Linking in some related markets:

I wonder whether GPT-4 has been tested on this, few-shot or otherwise?

From the paper: 'ARC comprises a training set and an evaluation set. The training set features 400 tasks, while the evaluation set features 600 tasks. The evaluation set is further split into a public evaluation set (400 tasks) and a private evaluation set (200 tasks).'
The GitHub repository contains only the public evaluation set. Does you question include the 200 private tasks?

@Allocatress Thank you for the question, there was indeed a need for clarification. I have updated the criteria to include not only the public evaluation set (400 questions) but 50 private questions (either official or not). The private questions are definitely needed to prevent hardcoded solutions.

@MGM Could actually be more than 50 private questions, but still needs to solve 100 percent of them at the first try.