Before which date at midnight EDT in 2025 will any model conquer Pokémon Red or Blue?
19
550Ṁ5116
Jul 15
1%
May 8
1%
May 23
1.3%
May 15
3%
May 31
5%
June 8
6%
June 15
7%
June 22
7%
June 30
8%
July 8
8%
July 15
8%
July 23
8%
July 31
9%
August 8
22%
August 15
26%
August 23
29%
August 31
35%
September 8
43%
September 15
48%
September 22
52%
September 30

Multiple LLMs are now playing and advancing through the Pokémon Red and Blue games (which are mostly identical in gameplay.)

This market will resolve all the times to YES that occur after the time when an AI model satisfies the criteria of an Any % speedrun for either game, landing the final blow on the final boss. Answers where time runs out first will resolve to NO. It is also possible that, if LLM progress does not advance at the same rate it has been, all answers could resolve to NO.

The same model version must have played through the entire game from start to finish, although the developer is permitted to make changes to its settings. The model that beats the game can be run in an official capacity by the company or by a third-party developer.

  • Update 2025-04-11 (PST) (AI summary of creator comment): - Human intervention disqualification: If a human ever presses the game buttons or directly instructs the model to take a specific action at any point during the run, that run is disqualified.

    • Allowed prompt adjustments: Changing prompts during the run (e.g., modifying settings like using 8 memory files instead of 4) remains valid, provided no direct human input controls game actions.

  • Update 2025-05-03 (PST) (AI summary of creator comment): * Following discussion regarding a specific run, the creator has stated that for a run to be counted as a win, the model must complete it without a single change to its prompt during the entire run.

Get
Ṁ1,000
to start trading!
Sort by:

Let's open up the floor to discussion.

From what I see below, Gemini did not solve the game by itself because the developer let Gemini know that it needed to obtain a key twice. While it was stated that correcting errors in prompts was acceptable, that was a direct intervention to tell the model something about the game, and it may be disqualifying.

Is that the assessment of others?

@SteveSokolowski there were many other similar things: e.g. Gemini was prevented from using escape ropes https://old.reddit.com/r/ClaudePlaysPokemon/comments/1kdjysi/gemini_beats_pokemon/mqblfeh/

@SteveSokolowski I haven't been following Gemini plays pokemon, but the way I understand it the model got a lot of help along the way, not just the key but things like separate Gemini instances for path-finding and puzzle-solving, which I think goes beyond what the description intended with "changing prompts during the run remains valid". Also Gemini could not to use the escape rope even if it pressed the button unless under certain circumstances. I think it's fair to say Gemini did not solve the game by itself even ignoring the key prompt, though that's already enough to disqualify it in my opinion. Keep in mind I have a bunch of NO shares, so check out also what others say.

@TenShino OK, then we won't count this run. If the model makes it through another run without a single change to its prompt, then we will count that as a win.

@SteveSokolowski I'd argue preventing Gemini from using escape ropes even when it presses the button is human intervention in the same way a human pressing buttons would be, so another run with the same restriction shouldn't count. If this restriction is lifted I think it should be fine to count it as a win.

Oops! I thought today was tomorrow.

In the unlikely event there is a miracle, I'll ask the @mods to reverse that resolution.

@SteveSokolowski Franky died, Steve. No more miracles. 😢

Is Gemini plays being considered here? Contestable whether the model is the only one playing

@JoeandSeth Yes. If a human ever actually presses the buttons in the game, or instructs the model to take a specific action, that disqualifies the run.

Otherwise, changing the prompts during the run to say things like "use 8 memory files instead of 4" is valid.

@SteveSokolowski iirc early in the run Gemini did have exact English instruction given for a position to navigate to

Low confidence, I only heard it from the chat, didn't see it myself. But this would dq that instance?

@JoeandSeth I've heard, but can't be confident, that Gemini's developer has been modifying the prompts as the run goes on. If it completes the game, then I'm sure that someone will investigate the actual prompts and the truth will come out.

If it does complete the game though, then it's likely he will just run it again from the start and it should be able to do it without further changes, which would delay the resolution date.

@SteveSokolowski

"Q: I've heard you frequently help Gemini (dev interventions, etc.). Isn't this cheating?
A: No, it's not cheating. Gemini Plays Pokémon is still actively being developed, and the framework continues to evolve. My interventions improve Gemini’s overall decision-making and reasoning abilities. I don't give specific hints—there are no walkthroughs or direct instructions for particular challenges like Mt. Moon.
The only thing that comes even close is letting Gemini know that it needs to talk to a Rocket Grunt twice to obtain the Lift Key, which was a bug that was later fixed in Pokemon Yellow.

Claude Plays Pokémon underwent similar behind-the-scenes refinements before streaming began. Learn more about that here. With Gemini, you're seeing the entire development process live!"

That's from the twitch page, and says that they let Gemini know they needed to talk to someone twice to obtain the lift key. That seems like it might invalidate the run?

@lemon10 That's a really fine line because as a bug in the game, it's possible that an unaware human would also never have progressed past that point without being told about the bug.

I'm interested in hearing what others have to say here.

@SteveSokolowski I think this might count as a disqualification https://www.reddit.com/r/ClaudePlaysPokemon/comments/1jyhx1r/gemini_refuses_a_direct_instruction_out_of/ but they claim it was just a test, it didn't help, and it was removed after one day

@Lorenzo But was any of this series of events stored in Gemini's memory for later use?

@SteveSokolowski I don't know

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules