What will be true about AI and MIT Mystery Hunt 2026?

Ṁ1.2kṀ1.5k

resolved Mar 9

Resolved

YES

A team announces that an AI independently solved at least 10 hunt puzzles regardless of order and standard puzzle unlock rules

Resolved

YES

A team using an OpenAI model has the “best” results

Resolved

YES

A team announces that AI solves any puzzle

Resolved

At least three groups or teams publish writeups of testing/training/benchmarking/etc. AI on the hunt

Resolved

A team announces that an AI independently solved at least 20 hunt puzzles, regardless of order and standard puzzle unlock rules

Resolved

A team announces that an AI solved at least 1 hunt metapuzzle (regardless of whether it solved any earlier puzzles or associated feeders)

Resolved

A team announces that an AI independently solved at least 1 entire hunt round (metapuzzle and any feeder puzzles whose answers were used to solve the meta). Does not have to be an opening round

Resolved

A team announces that an AI independently solved at least 5 hunt puzzles *following* standard hunt unlock rules (can only skip puzzles or do free unlocks if other teams given the same option). A free unlock is not a solve

Resolved

A team with a Google model has the “best” results

Resolved

A puzzle uses AI-generated art

I won’t bet.

All criteria must be met by original market close date to count.

I realize “best” is subjective. If it’s sufficiently unclear to me I may N/A.

I’m allowing humans to do some work with inputting puzzles, clicking on the website, taking screenshots, etc, but for a group’s writeup to count the AI should be doing the vast majority of the work on any puzzles it is claimed to have solved.

Clarification questions welcome.

I’ll try to be generous in accepting what teams report.

I may N/A some markets if it is unclear whether they have been met or if I realize belatedly the criteria are too poorly defined.

Market context

Technology

Technical AI Timelines

Puzzlehunts

MIT Mystery Hunt

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ104
2		Ṁ38
3		Ṁ33
4		Ṁ29
5		Ṁ13

People are also trading

How many puzzles will there be in the 2027 MIT Mystery Hunt?

181

Will a team consisting primarily of ML models complete MIT Mystery Hunt by 2030?

29% chance

Will LLMs be banned at the 2027 MIT Mystery Hunt?

19% chance

How well will I do in puzzlehunts held in 2026? (See description)

AI solves Millenium Prize Problem in 2026?

8% chance

Will AI solve a famous unsolved crime before April of 2028?

33% chance

Will there be a verified AI-assisted solution to one of the Millennium Prize problems by the end of 2026?

8% chance

Will an unsolved millenium prize problem be solved by AI by the end of 2028

Sort by:

Apologies all for the delay in resolution. In addition to Erc's analysis below, I was able to find a few more write-ups that either directly address or mention AI in this list: https://www.puzzles.wiki/wiki/MIT_Mystery_Hunt_2026

They were:

https://fortenf.org/e/2026/01/24/mystery-hunt-2026.html
(We used AI a bunch on various puzzles, but generally for specific sub-tasks rather than trying to tackle full puzzles)

https://www.bilibili.com/opus/1163454413750140930/?from=readlist
(We vibe-coded an app to help solve a puzzle)

https://www.alexirpan.com/2026/01/29/mh-2026.html
"On Manifold, people tried a few text-only puzzles after the fact, and GPT 5.2 Pro managed to zero-shot both this puzzle (https://chatgpt.com/share/696aebb1-2a48-8000-a2e0-3575d9998bc8) and Six Shifts given 5 letters (https://chatgpt.com/share/696afc2f-d1f4-8000-b520-3e2195ea032c). Sure, it took 41 minutes and 16 minutes respectively, and they were asked on questions close to math and coding, and it probably cost $20 in API credits, but it worked and could have given us a forward solve we missed."

For markets about specific puzzle-solving abilities, the only one met this year was the "solved 10 puzzles total" market.

Eric's results below are more detailed than any of the analysis I found in the above links (thanks for your work on this and sharing your results!). As such, I'm also calling it a win for an OpenAI model doing the best. I do wonder if the results would be different if testing multi-modal instead of text-only, but I did not invest the time to research that myself.

As for the 3-writeups, I wouldn't consider there to be 3 write-ups meeting the description given, though I apologize for the lack of specificity ahead of time in what would count.

Clock is running out on this one

@JimHays Is this going to be unresolved indefinitely? If not, all "A team announces" should resolve to NO, and "Best result" should N/A.

Without any evidence, it's basically all no

@Soni No, sorry, I got sniped by the MrBeast hunt. I know there’s a lot of writeups that I’d like to review here first to make sure I haven’t missed anything important:

https://www.puzzles.wiki/wiki/MIT_Mystery_Hunt_2026

Hopefully this weekend.

Sorry again for the delay!

Sorry for the delay on reopening this, I’ve had a lot going on. Here’s what I’m envisioning for the future of this market:

For “3 published writeups”: I’d count something about as detailed/involved as Eric’s comment below if it exists in a form such as a blog post, but a comment on a manifold market doesn’t seem like a writeup to me. This can be from teams during hunt, or post-solves.

However, for all other markets, I’ll count evidence in any format that I deem trustworthy: comments below, blog posts, tweets, Discord screenshots, etc. Negative results will stay open until original market close. YES results can resolve early.

Closing the question while I seek input:

I had envisioned this market as being based on results for official writeups from registered teams that explicitly set out to test this during the hunt. But the criteria was vague enough that, for example, a group at OpenAI that did a write-up on some post-solves within a month would also have counted even if they didn't officially compete.

It looks possible that there were no such groups this year. In that case, if there aren't any official write-ups from "teams" but maybe just one-offs from individuals (e.g. I have "announced" below that an AI solved a puzzle, but I don't speak on behalf of a team, this was just something I did myself), do traders think these kinds of reported results should qualify?

I wanted this to be more about current AI puzzle-solving capabilities, (without committing to testing each of these possibilities myself) but I see that that doesn't fully match the spirit of what I wrote. Anyone want to make arguments for whether this should/should not count?

@JimHays No strong feelings.

Sadly I feel like I had a very hard time gauging how to answer in this market.

@JimHays I ran GPT 5.2 Pro on 22 ~text-only puzzles in Hunt (and a few multimodal ones, but didn't have great luck there. The vast majority of puzzles have some image component, so a better harness would make it much more powerful.).

It solved 9 of them in one shot, getting the answer that I submitted for my team (Hunches in Bunches). It also solved another one independently but after we had submitted our answer.

(Details:

Gaussian integer puzzle https://chatgpt.com/share/696aebb1-2a48-8000-a2e0-3575d9998bc8

Alphabet text puzzle with just 5/26 letters https://chatgpt.com/share/696afc2f-d1f4-8000-b520-3e2195ea032c

Number function puzzle https://chatgpt.com/share/696d0be9-9b68-8000-88ea-aba3b091538b

Personality quiz puzzle https://chatgpt.com/share/696dccce-0104-8000-b319-55a9a25b67ce

It also easily solved all five Dichotomies puzzles. They shouldn't "really" count as separate individual puzzles -- like the Trends puzzles you report, they are much smaller than most puzzles, more like subparts of a single puzzle -- but they officially are individual puzzles.

Drop * from teams it was perfectly capable of solving but we didn't use its solution: https://chatgpt.com/share/696f2f76-175c-8000-bdc5-ce0a5ec746d1

)

I only tried 22 text-only puzzles, so it had nearly a 50% success rate on them. (6/18 if we count the dichotomies as a single puzzle). It solved all the math puzzles.

@EricPrice Does this count as one of the "at least 3" teams to publish something?

Also you should push it to 10 solves if you're not at 10 yet -- that could resolve an answer yes!

@JimHays

I did a more systematic survey of AI performance, focusing on all-text puzzles for simplicity. I picked 25 such puzzles, which is almost all the ones solvable from the text given by "copy puzzle" (either having no images, or only images that are completely explained by alt text).I only included one of the dichotomies, since they are so similar.

GPT 5.2 Pro gets 10/25.

Gemini 3 Pro and Opus 4.5 get 2/25. (just "Wanderer's Colorlog" and "Haunted Mansion?")

GPT 5.1 Pro, the legacy model, got 6/25.

I think it's fair to resolve (at least one puzzle, at least 10 puzzles) to YES at this point. And I guess we should wait for other teams with contradictory information--multimodal puzzles may well be different--but the OpenAI model was very clearly best in my testing.

Spoilers for Trends: Names

Gemini Pro is able to quickly, easily, and without error solve this puzzle. Conversation linked below.
https://gemini.google.com/share/77e0327837e5

@JimHays It my testing, it also one-shots Trends: Finance, but failed on Trends: Numbers (probably because the trend for one of the lines is too recent to be in its training data).

Planning to resolve this NO unless someone presents evidence to the contrary. I didn’t see every puzzle, but none of the ones I saw did, nor did I hear about this

@JimHays I didn't see any, either

@JimHays I don’t have any special knowledge, but I’m surprised this is so low. But maybe some groups won’t publish writeups or they will take too long to come out?

@JimHays Maybe I misunderstood what testing/training/benchmarking meant.

@Eliza I should have used the exact language from the registration form but this is what I was intending to refer to:

It would likely be a lot more work to do a thorough attempt on this compared to the Putnam or IMO, and maybe results won’t be impressive enough for official teams to announce results, so whatever we get might just be from hobbyists?

@JimHays This is back-channel conversation that maybe I'm not supposed to repeat, but as of mid-December zero teams that had registered have said that they were doing so for AI training or benchmarking. I'm somewhat bullish on the possibility of AI puzzle solving (though I doubt models right now are up to the task), but that's why I have so many NO bets in this market.

Some of these options probably track with "Will there be a round of fish puzzles this year?" but I don't really have a good sense for the odds of that.