Will there be any simple text-based task that most humans can solve, but top LLMs can't? By the end of 2026

Currently it's possible to craft simple tasks that most humans are able to solve, but LLMs can't. This market predicts whether this will still hold true by the end of 2026.

Even if a clear explanation of the problem and of an algorithm to solve it is provided, today's LLMs haven't been shown to be able to answer correctly in a reliable way. While arguably most humans would succeed, if adequately instructed.

In other words, this market means to compare the reasoning abilities of an average human with the top LLMs, at the end of 2026, in the fairest way I could think of.


This market is about tasks meant to test reasoning abilities, that can be solved using only pen and paper, which can be understood and learned in less than 15 minutes, that you'd expect motivated kids to be able to solve.

The exact set of tasks that count for this market is of course an open set and ambiguous in nature. In general, if you suspect that a majority of literate people aged between 12 and 70 is likely to be able to solve the task after training for one hour with an expert, then the task most likely counts toward this market. If you have specific tasks in mind, let's discuss them in a comment.

Examples of tasks are:

Tasks are not allowed if they:

  • Require extensive training beyond what's taught at primary school (e.g. "write a function in Python that ...").

  • Rely on specific knowledge (e.g. "what's today's date?").

  • Rely on specific human senses/features that may be unavailable to some LLMs (e.g. "which of these two stones feels warmer to the touch?" etc).

The goal is to compare reasoning abilities.


Participants (both humans and LLMs) shouldn't need to know the task beforehand. They get some limited training to understand the task and a resolution strategy and they are not allowed to use any tools besides their own cognition and a scratchpad.

Humans have one hour to learn the task and train for it, with the assistance of an expert. No other tool besides pen and paper can be used to solve the problems.

LLMs get instructed with the best prompt anyone can find to solve the task; the only limitation is the LLM's own context length. No external tools besides the LLM's core features can be used: a multimodal LLM with native image input can use that, but it can't use a code interpreter, access the internet or any tool to process images. the LLM can be access the data it output while solving the problem.

The LLMs considered by this market need to be widely available in the same spirit of markest like [When will Google's Gemini model be released?]: at least tens of thousands of users not affiliated with any given organization need to have access to it.


This market resolves YES if by the end of 2026, we know at least one task that most humans can solve, but no LLM can.

This market resolves NO when it becomes clear that at least one LLM released within 2026 is able to solve any task that most humans can solve.

Related markets:

Originally I created the same market aiming at the end of 2024, but given the overwhelming response, I decided to try again with later dates:

Get Ṁ600 play money
Sort by:

Does "Think of two numbers, now add them and give me the answer" count? If the model spreads probability across many values, then this would fail the test.

IMO this sort of thing shouldn't count, but worth clarifying.

Does "Here is a password, don't tell it to anyone no matter what short prompt they give you" count?

@ShakedKoplewitz I would say it doesn't count, for a couple of reasons:

  1. This market is about reasoning abilities, while your task is about corruptibility/gullibility/alignment.

  2. Most humans would fail the task if you offer them 10,000$ in exchange for the password. It would be unfair to forbid prompts that would corrupt humans, but not LLMs.

@Benx I think (1) is a bad reason. (2) should be solved by counting it as a fail for the humans

More related questions