By which years will AGI be released (Hendrycks definition >95%)?

A new highly-detailed operationalization of AGI has been proposed by Hendrycks et al. based on the standard that "AGI is an AI that can match or exceed the cognitive versatility and proficiency of a well-educated adult" (www.agidefinition.ai/). Their definition is inspired by Cattell-Horn-Carroll theory and calculate a model’s “AGI Score” as a composite of ten equally-weighted cognitive categories. GPT-4 is judged to score 27% according to the definition, and GPT-5 is scored at 58%.

This market resolves YES for all years that lie AFTER the first date when an AI system, with an AGI Score eventually judged to be above 95%, is deployed and accessible to customers external to the AI developer for commercial or consumer purposes. The score would either need to be judged by the authors, or by a third party in a manner credibly commensurate with the original evaluations of GPT-4 and GPT-5. The score would need to be announced within 6 months of model release. A year resolves NO otherwise.

The cognitive categories in the definition consist of:

General Knowledge (K): The breadth of factual understanding of the world, encompassing commonsense, culture, science, social science, and history.
Reading and Writing Ability (RW): Proficiency in consuming and producing written language, from basic decoding to complex comprehension, composition, and usage.
Mathematical Ability (M): The depth of mathematical knowledge and skills across arithmetic, algebra, geometry, probability, and calculus.
On-the-Spot Reasoning (R): The flexible control of attention to solve novel problems without relying exclusively on previously learned schemas, tested via deduction and induction.
Working Memory (WM): The ability to maintain and manipulate information in active attention across textual, auditory, and visual modalities.
Long-Term Memory Storage (MS): The capability to continually learn new information (associative, meaningful, and verbatim).
Long-Term Memory Retrieval (MR): The fluency and precision of accessing stored knowledge, including the critical ability to avoid confabulation (hallucinations).
Visual Processing (V): The ability to perceive, analyze, reason about, generate, and scan visual information.
Auditory Processing (A): The capacity to discriminate, recognize, and work creatively with auditory stimuli, including speech, rhythm, and music.
Speed (S): The ability to perform simple cognitive tasks quickly, encompassing perceptual speed, reaction times, and processing fluency.

More context on the definition: "Our definition is not an automatic evaluation nor a dataset, but rather it specifies a large collection of well-scoped tasks that test specific cognitive abilities. Whether AIs can solve these tasks can be manually assessed by anyone, and people could supplement their testing using the best evaluations available at the time. This makes our definition more broad and more robust than fixed automatic AI capabilities datasets. Secondly, our definition focuses on capabilities frequently possessed by well-educated individuals, not a superhuman aggregate of all well-educated individuals’ combined knowledge and skills. Therefore, our AGI definition is about human-level AI, not economy-level AI; we measure cognitive abilities rather than specialized economically valuable know-how, nor is our measurement a direct predictor of automation or economic diffusion. We leave economic measurements of advanced AI to other work. Last, we deliberately focus on core cognitive capabilities rather than physical abilities such as motor skills or tactile sensing, as we seek to measure the capabilities of the mind rather than the quality of its actuators or sensors."

Update 2025-10-17 (PST) (AI summary of creator comment): "By" means "before" in the market question. If AGI is released on July 1, 2027, then 2027 would resolve NO (since the release happened during 2027, not before it).

People are also trading

Related questions