AI solves the Abstraction and Reasoning Corpus (ARC) by 2025

1kṀ4144

resolved Jan 10

Resolved

ALL

A machine program or model solves 100% of the ARC evaluation questions without it having any contact with those questions prior to the first and only trial to answer. The set of questions must include all the 400 public evaluation questions and either at least 50 private evaluation questions available in something like the ARCathon (https://lab42.global/arcathon/) or at least 50 questions of similar kind and difficulty created after the program. Must be achieved by the end of 2024 (31st December). Questions and details about ARC here: https://github.com/fchollet/ARC

Market with the same resolution criteria but timeline is before 2028:

https://manifold.markets/MGM/ai-solves-the-abstraction-and-reaso-6312f0f1cbc1?r=TUdN

Apr 5, 10:06am: ~~AI solves the Abstraction and Reasoning Corpus (ARC) by François Chollet by 2025~~ → AI solves the Abstraction and Reasoning Corpus (ARC) by 2025

Technology

Technical AI Timelines

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ119
2		Ṁ86
3		Ṁ86
4		Ṁ69
5		Ṁ63

People are also trading

AI solves Millenium Prize Problem in 2025?

2% chance

AI solves the Abstraction and Reasoning Corpus (ARC) by 2028

25% chance

Will AI be Recursively Self Improving by mid 2026?

40% chance

Will ARC's Heuristic Arguments research substantially advance AI alignment before 2027?

26% chance

Will the ARC Prize Foundation succeed at making a new benchmark that is easy for humans but still hard for the best AIs?

88% chance

Will the state-of-the-art AI model use latent space to reason by 2026?

15% chance

AI solves a Millennium Prize Problem before 2030?

22% chance

Will AI pass the Winograd schema challenge by the end of 2025?

86% chance

What will be the top-3 AI tools in 2025?

What will be the top-3 AI tools in 2040?

7 Comments

28 Holders

63 Trades

Sort by:

@mods Creator is deleted account, resolves NO.

Not clear to me that 100% is achievable. Usually benchmarks suffer from some degree of under-specification/error. I've created an 85% (human-level) version here https://manifold.markets/JacobPfau/will-the-arcagi-grand-prize-be-clai

Linking in some related markets:

I wonder whether GPT-4 has been tested on this, few-shot or otherwise?

From the paper: 'ARC comprises a training set and an evaluation set. The training set features 400 tasks, while the evaluation set features 600 tasks. The evaluation set is further split into a public evaluation set (400 tasks) and a private evaluation set (200 tasks).'
The GitHub repository contains only the public evaluation set. Does you question include the 200 private tasks?

@Allocatress Thank you for the question, there was indeed a need for clarification. I have updated the criteria to include not only the public evaluation set (400 questions) but 50 private questions (either official or not). The private questions are definitely needed to prevent hardcoded solutions.