Top 5 most useful tests for AGI

16kṀ4658

Feb 1

60%

Build, debug and test until its of sufficient quality, a complex piece of software like a mobile app including a backend service

56%

Beating Pokemon games

52%

HCAST - METR

52%

Manually rearrange and overlap 100 random images in an image editor (with no other kinds of edits) to create a recognizable portrait

51%

ARC-AGI (any version)

48%

Wozniak Coffee Test (requires controlling a robot)

34%

Opinion poll of Manifold userbase

30%

Predict the output of an arbitrary set of NAND gates and inputs

Note: I'll reimburse anyone the cost of adding an answer (regardless of what it is), and also reimburse any further answers that refer to specific benchmarks (e.g. AgentBench).

I'd like to find out what Manifold thinks are the 5 best benchmarks of this kind.

Ideally they should have fairly objective protocols for evaluating the agent/system under test, but since I can't really define that rigorously myself, I will let people add and vote whatever they like.

At the end of January 2026, I'll conduct a poll to select the 5 winners.

I won't bet.

Linked market:

/singer/an-algorithm-exists-that-can-run-on

Update 2025-05-17 (PST) (AI summary of creator comment): The creator specified that the answer option ARC-AGI (any version) includes both ARC-AGI-1 and ARC-AGI-2. The creator has also noted that this option has been updated.

Technology

Get

1,000

to start trading!

People are also trading

AGI When? [High Quality Turing Test]

Will AGI figure out a quantilizer of which people roughly continued most to its global computation?

Sort by:

I'm not willing to commit 1k mana to add it, but I have another idea:
> Manually rearrange and overlap 100 random images in an image editor (with no other kinds of edits) to create a recognizable portrait

@Haiku I've added it for you.

Pokemon

Can I get reimbursed for adding HCAST?

@InsertCustomName sent you 1000. Lmk if that wasn't the right amount.

bought Ṁ150 NO

I want to add "Human IQ test as PDF file" or something like that.

@Shai you may add it

@singer does this mean ARC-AGI-2? Or just the first one?

@Shai they look similar enough that I'll include both. I've updated the option.

@CraigDemel this seems like something a simple calculating computer program could do, with no general intelligence involved, right?

@TheAllMemeingEye I think he might have meant solving this in polynomial time? https://en.wikipedia.org/wiki/Circuit_satisfiability_problem

my test of true human intelligence is solving an np-complete problem in polytime

every time.

@Bayesian Maybe just solving? But in this case @TheAllMemeingEye is correct, it is solved.

@ProjectVictory At the time I wrote this, the LLMs I tried were bad at predicting output of one NAND with various inputs, even after being corrected. Which I found humorous, given how many NANDS they incorporated.

@CraigDemel I guess it might be a necessary but certainly not sufficient condition for agi, kinda like being able to draw specific shapes in ASCII art

https://youtu.be/e_HSA1lUd04?si=yIFjKluebequQUOC

Note: I'll reimburse anyone the cost of adding an answer.

Would I get reimbursed the 1000 mana cost for adding any of the following?

"Opinion poll of Manifold userbase"
"Opinion poll of general public"
"Equal success rate at user-controlled Turing test as median biological human"
"No remaining job roles in which its performance is evaluated by superiors as lower quality than the median biological human employee in said roles"

@TheAllMemeingEye Yes. I'll reimburse you for your favourite. I'll also reimburse any specific benchmark that anyone adds, even if they add multiple (I'll clarify the deal in the description).

@singer done :)

Open Phil called for development of agent benchmarks, and they probably won't come out before this market's deadline. But they'll be more useful than whatever is in this list

@Siebe I'll just move it a year forward then. @traders DM me if you want a refund.

@singer the fact that top 5 answers resolve Yes and there's only four answers is quite funny to me. I think it shows the current state of things quite well.

@ProjectVictory yeah... that's another reason why I want to extend the due date

People are also trading

AGI When? [High Quality Turing Test]

Will AGI figure out a quantilizer of which people roughly continued most to its global computation?

30% chance

People are also trading

People are also trading

Related questions