Top 5 most useful tests for AGI
💎
Premium
14
Ṁ2651
2026
53%
Build, debug and test until its of sufficient quality, a complex piece of software like a mobile app including a backend service
50%
ARC-AGI
48%
Wozniak Coffee Test (requires controlling a robot)
43%
Opinion poll of Manifold userbase
31%
Predict the output of an arbitrary set of NAND gates and inputs

Note: I'll reimburse anyone the cost of adding an answer (regardless of what it is), and also reimburse any further answers that refer to specific benchmarks (e.g. AgentBench).

I'd like to find out what Manifold thinks are the 5 best benchmarks of this kind.

Ideally they should have fairly objective protocols for evaluating the agent/system under test, but since I can't really define that rigorously myself, I will let people add and vote whatever they like.

At the end of January 2026, I'll conduct a poll to select the 5 winners.

I won't bet.

Linked market:

/singer/an-algorithm-exists-that-can-run-on

Get
Ṁ1,000
and
S3.00
Sort by:

@CraigDemel this seems like something a simple calculating computer program could do, with no general intelligence involved, right?

@TheAllMemeingEye I think he might have meant solving this in polynomial time? https://en.wikipedia.org/wiki/Circuit_satisfiability_problem

my test of true human intelligence is solving an np-complete problem in polytime

every time.

@Bayesian Maybe just solving? But in this case @TheAllMemeingEye is correct, it is solved.

@ProjectVictory At the time I wrote this, the LLMs I tried were bad at predicting output of one NAND with various inputs, even after being corrected. Which I found humorous, given how many NANDS they incorporated.

@CraigDemel I guess it might be a necessary but certainly not sufficient condition for agi, kinda like being able to draw specific shapes in ASCII art

https://youtu.be/e_HSA1lUd04?si=yIFjKluebequQUOC

Note: I'll reimburse anyone the cost of adding an answer.

Would I get reimbursed the 1000 mana cost for adding any of the following?

  • "Opinion poll of Manifold userbase"

  • "Opinion poll of general public"

  • "Equal success rate at user-controlled Turing test as median biological human"

  • "No remaining job roles in which its performance is evaluated by superiors as lower quality than the median biological human employee in said roles"

@TheAllMemeingEye Yes. I'll reimburse you for your favourite. I'll also reimburse any specific benchmark that anyone adds, even if they add multiple (I'll clarify the deal in the description).

@singer done :)

Open Phil called for development of agent benchmarks, and they probably won't come out before this market's deadline. But they'll be more useful than whatever is in this list

@Siebe I'll just move it a year forward then. @traders DM me if you want a refund.

@singer the fact that top 5 answers resolve Yes and there's only four answers is quite funny to me. I think it shows the current state of things quite well.

@ProjectVictory yeah... that's another reason why I want to extend the due date

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules