Top 5 most useful tests for AGI
42
24kแน€39k
Feb 1
88%
HCAST - METR
74%
Epoch Capabilities Index (ECI)
63%
CAIS Remote Labor Index
58%
ARC-AGI (any version)
45%
35%
28%
Opinion poll of Manifold userbase
20%
Wozniak Coffee Test (requires controlling a robot)
11%
Manually rearrange and overlap 100 random images in an image editor (with no other kinds of edits) to create a recognizable portrait
9%
Build, debug and test until its of sufficient quality, a complex piece of software like a mobile app including a backend service
8%
Predict the output of an arbitrary set of NAND gates and inputs
7%
Beating Pokemon games
6%
ArtificialAnalysis Intelligence Index

Note: I'll reimburse anyone the cost of adding an answer (regardless of what it is), and also reimburse any further answers that refer to specific benchmarks (e.g. AgentBench).

I'd like to find out what Manifold thinks are the 5 best benchmarks of this kind.

Ideally they should have fairly objective protocols for evaluating the agent/system under test, but since I can't really define that rigorously myself, I will let people add and vote whatever they like.

At the end of January 2026, I'll conduct a poll to select the 5 winners.

I won't bet.

Linked market:

/singer/an-algorithm-exists-that-can-run-on

  • Update 2025-05-17 (PST) (AI summary of creator comment): The creator specified that the answer option ARC-AGI (any version) includes both ARC-AGI-1 and ARC-AGI-2. The creator has also noted that this option has been updated.

Market context
Get
แน€1,000
to start trading!
Sort by:

Will the poll be multi-select or ranked choice? Either seems preferable to single vote.

imo multi-select > ranked choice > single vote yeah

@traders resolve at the end of the month or extend again? I want to extend it to EOY.

@singer I'd be in favor of resolving this market at the end of the month, as promised. If you want a market that resolves at EOY, I would suggest you duplicate this market and extend the duplicate.

+1 to resolving at the end of the month, i bet based on the interest rate implied by that assumption

Credit to @4fa for bringing this one to my attention

bought แน€100 NO

@TheAllMemeingEye How is this a useful metric for AGI if it's almost saturated? I think AGI benchmarks should be a lot more long-horizon, no?

bought แน€150 YES

@singer I recommend the CAIS Remote Labor Index

for more context and the inspiration for this answer:

bought แน€350 NO

I also recommend adding the ECI (Epoch Capabilities Index) which keeps track of trends across lots of benchmarks and is intended to grow linearly overtime across many orders of magnitude of capability, see here https://epoch.ai/benchmarks/eci

I'm not willing to commit 1k mana to add it, but I have another idea:
> Manually rearrange and overlap 100 random images in an image editor (with no other kinds of edits) to create a recognizable portrait

@Haiku I've added it for you.

Pokemon

Can I get reimbursed for adding HCAST?

@InsertCustomName sent you 1000. Lmk if that wasn't the right amount.

bought แน€150 NO

I want to add "Human IQ test as PDF file" or something like that.

@Shai you may add it

@singer does this mean ARC-AGI-2? Or just the first one?

@Shai they look similar enough that I'll include both. I've updated the option.

@CraigDemel this seems like something a simple calculating computer program could do, with no general intelligence involved, right?

@TheAllMemeingEye I think he might have meant solving this in polynomial time? https://en.wikipedia.org/wiki/Circuit_satisfiability_problem

my test of true human intelligence is solving an np-complete problem in polytime

every time.

@Bayesian Maybe just solving? But in this case @TheAllMemeingEye is correct, it is solved.

@ProjectVictory At the time I wrote this, the LLMs I tried were bad at predicting output of one NAND with various inputs, even after being corrected. Which I found humorous, given how many NANDS they incorporated.

@CraigDemel I guess it might be a necessary but certainly not sufficient condition for agi, kinda like being able to draw specific shapes in ASCII art

https://youtu.be/e_HSA1lUd04?si=yIFjKluebequQUOC

Note: I'll reimburse anyone the cost of adding an answer.

Would I get reimbursed the 1000 mana cost for adding any of the following?

  • "Opinion poll of Manifold userbase"

  • "Opinion poll of general public"

  • "Equal success rate at user-controlled Turing test as median biological human"

  • "No remaining job roles in which its performance is evaluated by superiors as lower quality than the median biological human employee in said roles"

@TheAllMemeingEye Yes. I'll reimburse you for your favourite. I'll also reimburse any specific benchmark that anyone adds, even if they add multiple (I'll clarify the deal in the description).

ยฉ Manifold Markets, Inc.โ€ขTermsโ€ขPrivacy