[5000 Mana prize] What is the best "Human-solvable, machine-hard puzzle"
19
12kṀ66k
Jan 1
1.3%
[[Place holder]]
3%
Take a photo, split it into a 100x100 grid, scramble the pieces, and ask user to reconstruct the original
10%
Spatial CAPTCHA puzzles
86%
Other

What Puzzle Am I Looking For?

Here are the requirements for the puzzle I want:

  1. Takes hours for a human to solve (10+ hours). Solvable virtually. Solvable by even an average 10 years ago.

  2. Not solvable by a machine — or at least takes a machine significantly longer than a human (e.g., 10× longer than a human solver). If a human can code up a program to solve this, that counts as solvable by machine.

  3. Quick to verify (ideally seconds).

  4. Mass-generatable by either a human or a computer (I should be able to produce many instances easily).

Requirements #1 and #2 are mandatory. Any puzzle that fails either of these is immediately disqualified.

For #3, I can tolerate verification taking up to 5 minutes. I can also tolerate verification procedures that are probabilistic — for example, ones that confirm correctness with 95% confidence instead of 100%.

For #4, Assuming requirements #1–#3 are satisfied, the quality of a puzzle is judged mainly by how easily it can be mass-generated. Ideally:

  • Tier 1: A puzzle that a human can generate in under 5 minutes, without using code or an LLM.

  • Tier 2: A puzzle generated by simple code (I know “simple” is subjective, but I'll use my own judgment).

  • Tire 3: A puzzle that can only be generated with the help of an LLM

Example of simple code: Generally codes that doesnt require the use of LLM, involving simple logics.
For example, code that piece together a bunch of smaller images into a bigger image is rather simple. code that does arithmetic calculations are rather simple.


I will award the person coming up with the best puzzle a 5000 mana prize.


If someone came up with a qualified Tier 1 puzzle I will resolve the market immediately. If no one managed to come up with a Tier 1 puzzle I will resolve the market based on the best (mass-generatable) proposed puzzle at the end of 2025.

  • Update 2025-11-22 (PST) (AI summary of creator comment): Machine compute limit clarification: When evaluating whether a puzzle is "not solvable by a machine" or takes significantly longer for machines, the machine is limited to using one GPU that costs less than $1000.

  • Update 2025-11-22 (PST) (AI summary of creator comment): Additional requirement added: The puzzle must be solvable by someone without programming knowledge or without knowledge of any highly specific domain. The spirit is that solving it takes time, but doesn't require high intelligence so virtually anyone can do it.

Mass-generatability clarification: For puzzles that are variations of a specific concept (like fairy chess), the creator would need to come up with different puzzle types (not just variations of the same puzzle) that are not solvable by LLM.

  • Update 2025-11-22 (PST) (AI summary of creator comment): Puzzles that require another human to be involved in the solving process (e.g., instructing someone to build a Lego set blindfolded) are not considered mass-generatable and do not qualify.

  • Update 2025-11-22 (PST) (AI summary of creator comment): The creator has expressed concern that Spatial CAPTCHA puzzles may not meet the mass-generatability requirement (Requirement #4). Specifically, the creator questions whether such puzzles can be generated in less than 5 minutes while still being complicated enough to take 10 hours for a human to solve. This suggests puzzles requiring significant generation time relative to their difficulty may not qualify for higher tiers.

  • Update 2025-11-22 (PST) (AI summary of creator comment): Verification requirement clarification: Solutions must be objectively verifiable. If a puzzle's solution is too subjective to determine correctness (e.g., badly drawn frames that may or may not represent a valid solution), it fails the mass-generatable criteria because the creator cannot mass-evaluate whether solutions are correct.

  • Update 2025-11-22 (PST) (AI summary of creator comment): Machine-solvability clarification: A puzzle is considered "solvable by machine" if someone can write a script with the intent to solve the described task. This includes tasks that could be automated through straightforward programming, even if the setup initially restricts humans to manual methods (e.g., typing one letter at a time).

  • Update 2025-11-23 (PST) (AI summary of creator comment): Tier ranking clarification for video game constraint puzzles: A puzzle based on playing an existing video game with specific constraints (e.g., completing specific tasks in a playthrough) would be ranked among the lower end of Tier 2 (puzzle generated by simple code). While the video game itself requires significant code to create, since the game already exists, implementing this type of puzzle is considered relatively easy.

  • Update 2025-11-23 (PST) (AI summary of creator comment): Mass-generatability time requirement clarification: For a puzzle that takes 10+ hours to solve, the creator expects generation time to be proportional to solving time. If a puzzle takes 10+ hours to solve, generation should not take close to an hour. The creator indicated that a task requiring ~1 hour of generation time for a 10+ hour solving task would not meet the mass-generatability requirements for higher tiers.

  • Update 2025-11-23 (PST) (AI summary of creator comment): Mass-generatability evaluation clarification: When evaluating puzzles on the mass-generatability criterion (Requirement #4), the creator will consider both generation time and whether coding is required. A puzzle that doesn't require any coding to generate has advantages even if it falls short on generation time requirements for higher tiers.

Get
Ṁ1,000
to start trading!
Sort by:

How far away are you physically from the puzzle solvers? Would a puzzle that involved a physical object be acceptable?

@A For example, you could take a hard boiled egg, write a message on the shell, then smash it and send the pieces to the puzzle takers. They need to reassemble the egg and find the message. Depending on how thoroughly you smash it you could make the solve time arbitrarily high to your preference.

Pick a video (either a movie that contestants would have access to, or a youtube video, or just record something yourself), and pick a random moment in the video. Describe that moment (in terms of content, not by the timestamp) and ask an extremely specific question about the scene. For example: "In the third fight scene of Kung Fu panda, right after the fourth punch, list the animals visible on screen from left to right."

  1. Time-consuming for human to solve? Just scale up the number/difficulty of questions until you're satisfied here.

  2. Not solvable by machine? I suspect not, but I haven't tried it.

  3. Quick to verify? Yes, you can just have an answer key.

  4. Mass-generatable? Yes, just click to a random spot in a movie that you know, describe it and ask a question about what's on screen.

@A My idea would be similar, but more focused on character interactions, as that is what humans seem to specialize in. E.g., watch a newly released Netflix series and ask MCQ about who was talking to whom about what during this series. Like, instead of scene description, we describe the dialogue that happened (can use LLM to rephrase the transcription) and then ask who the characters were that had this dialogue

@A I suspect that at the current state AI would likely fail your task. In term of mass generatable though, I think it would take at least 5 mins for me to go through a 3 hour movie and create the task that takes 1 hour to solve. So for a 10+ hour task, it would take close to an hour to generate. Do you more or less agree?

@AmmonLam Yeah, probably. 10x ratio of create time to solve time is pretty decent I think but it might not quite meet your spec.

@A it does fall a little short on the mass generatable criteria. The nice thing about your proposed puzzle though is that it doesn’t require any coding to generate.

@AmmonLam
I have a file with ~100 questions in a style of:

"What is the full question, including "?" sign on Stripe "Trends in payments | Stripe Sessions 2019" video at 18:00"

Not exactly the same, but it also aligns with the idea of being much easier for humans at present.

Determining if two unlabelled graphs (from graph theory, with nodes and edges) are just embedded differently, or not the same graph.

It's human solvable for reasonable graph sizes but has n factorial time complexity.

@AlanTennant good idea

bought Ṁ15 YES

@121 Try it with two random 20 node graphs and still tell me it's easy. You are allowed to move the nodes around to help you solve it.

@AlanTennant its too hard

@121 yeah, so dial down the number of nodes or maximum number of edges allowed until it's human solvable but might not be practically computer solvable.

What about playing an (already existing) video game with specific constraints?

Choose some game that you (the test creator) are familiar with which is sufficiently long and with enough player agency.

Have each test taker create and deliver a playthrough video in which they complete specific tasks, and also provide the timestamps for each task.

Examples of elements (fictitious game but you probably get the idea):

  • The player starts a new game as a paladin named "Zolbax".

  • At some point the player is standing in water with exactly 335 health.

  • The player dies to the final boss while wielding a dagger and wearing the least amount of armor/clothes the game allows.

I think it satisfies all the requirements depending on the game chosen, although may not be what you consider a "puzzle".

@JustKevin This seems to satisfy the criteria pretty well actually.

I think this would be among lower ranked within Tier 2: A puzzle generated by simple code, since an video game takes quick a lot of code to create. Though I agree that since the video games already exists, this "puzzle" is rather easy to implement

@AmmonLam You can make a code to procedurally generate "video games" with various sprites/textures/game mechanics/levels. Nothing too complex, but for now, it doesn't need to be complex to stump modern "machines"

@AmmonLam For the existing video games, they also often have some screens that you are forced to observe to proceed, which makes it easyish to verify whether the human/machine passed this test

How strict are you on #2? How adversarial is the szenario (Is it sufficient if a normal machine fails, but we could easily build a machine which succeeds?)?

Copy the content of a long textfile into an empty text file by sequentially inserting each letter into the new textfile. Use the quickest and most efficient means available.

I think you could set this up in a way that humans can only type into a keyboard, one letter at a time, thus taking them 10+ hours. A machine will either fail, or be a lot faster.

@Primer I added clarification on #2. What you described sounds like something that is solvable if I write a script with the intent to do what you described. I count that as solvable by machine.

Show the user the first and last frame of a 5-second which contains physical motion. Ask the user to animate (or draw) a sequence of frames which interpolates the whole video. E.g., the first frame shows a kid inside the kitchen holding a baseball, and the last frame shows the ball on the ground and a kettle spilt on the counter.

You could make it harder with longer videos with lots of motion. E.g. you take a 30-second clip of a volleyball rally, and you show the user a snapshot at every 5 second interval.

@ItsMe

I think this puzzle passes the lowest standard, but the solution seems pretty subjective to me. Like if I were to attempt to solve this, and drew a few badly drawn frames, how do you decide if I found the solution or not?

Furthermore, If I can’t mass evaluate whether the solution is correct, I would say that the puzzle fall short on the mass-generatable criteria

sold Ṁ561 NO

@ItsMe Can AI do this? It seems harder than a captcha

I have heard of that actually. But that only does Obama, not arbitrary pictures.

@ItsMe You can change the target photo to be non-Obama photos!

@ItsMe (I think I understand your objection, and I am being willfully obtuse.)

© Manifold Markets, Inc.TermsPrivacy