Before which date at midnight EDT in 2025 will any model conquer Pokémon Red or Blue?

550Ṁ7452

Jul 15

1.0%

June 8

1.0%

July 8

1.0%

June 30

1.0%

July 15

1.0%

June 15

1.0%

June 22

1.6%

July 23

1.8%

July 31

August 8

August 15

August 23

August 31

12%

September 8

14%

September 15

24%

September 22

31%

September 30

Resolved

April 30

Resolved

May 8

Resolved

May 23

Resolved

May 31

Multiple LLMs are now playing and advancing through the Pokémon Red and Blue games (which are mostly identical in gameplay.)

This market will resolve all the times to YES that occur after the time when an AI model satisfies the criteria of an Any % speedrun for either game, landing the final blow on the final boss. Answers where time runs out first will resolve to NO. It is also possible that, if LLM progress does not advance at the same rate it has been, all answers could resolve to NO.

The same model version must have played through the entire game from start to finish, although the developer is permitted to make changes to its settings. The model that beats the game can be run in an official capacity by the company or by a third-party developer.

Update 2025-04-11 (PST) (AI summary of creator comment): - Human intervention disqualification: If a human ever presses the game buttons or directly instructs the model to take a specific action at any point during the run, that run is disqualified.
- Allowed prompt adjustments: Changing prompts during the run (e.g., modifying settings like using 8 memory files instead of 4) remains valid, provided no direct human input controls game actions.

Update 2025-05-03 (PST) (AI summary of creator comment): * Following discussion regarding a specific run, the creator has stated that for a run to be counted as a win, the model must complete it without a single change to its prompt during the entire run.

Technology

Technical AI Timelines

Gaming

Pokemon

Get

1,000

to start trading!

People are also trading

Will Claude Become a Pokemon Master before 2027?

77% chance

Will the official ClaudePlaysPokemon stream complete Pokémon Red?

84% chance

Will Claude become a Pokèmon Master by the end of 2025?

7% chance

Will remakes of Pokémon Black and White come out before the end of 2025?

1% chance

Will another Pokémon Mystery Dungeon game be officially announced before EOY 2025?

2% chance

By the end of 2025, will any AI beat Pokemon Emerald Version without human assistance?

12% chance

When will Pokémon Bank shut down?

Will Nintendo or the Pokemon Company release an officially licensed Pokemon themed AI of some form by the end of 2025?

4% chance

Will the Pokemon Firered/leaftgreen Elite 4 Round 2 speedrun WR go below 3h 26m before 2028?

52% chance

If Pokémon Legends Z-A leaks, will Pokémon Legends Z-A release by 2029?

Sort by:

Let's open up the floor to discussion.

From what I see below, Gemini did not solve the game by itself because the developer let Gemini know that it needed to obtain a key twice. While it was stated that correcting errors in prompts was acceptable, that was a direct intervention to tell the model something about the game, and it may be disqualifying.

Is that the assessment of others?

@SteveSokolowski there were many other similar things: e.g. Gemini was prevented from using escape ropes https://old.reddit.com/r/ClaudePlaysPokemon/comments/1kdjysi/gemini_beats_pokemon/mqblfeh/

@SteveSokolowski I haven't been following Gemini plays pokemon, but the way I understand it the model got a lot of help along the way, not just the key but things like separate Gemini instances for path-finding and puzzle-solving, which I think goes beyond what the description intended with "changing prompts during the run remains valid". Also Gemini could not to use the escape rope even if it pressed the button unless under certain circumstances. I think it's fair to say Gemini did not solve the game by itself even ignoring the key prompt, though that's already enough to disqualify it in my opinion. Keep in mind I have a bunch of NO shares, so check out also what others say.

@TenShino OK, then we won't count this run. If the model makes it through another run without a single change to its prompt, then we will count that as a win.

@SteveSokolowski I'd argue preventing Gemini from using escape ropes even when it presses the button is human intervention in the same way a human pressing buttons would be, so another run with the same restriction shouldn't count. If this restriction is lifted I think it should be fine to count it as a win.

Oops! I thought today was tomorrow.

In the unlikely event there is a miracle, I'll ask the @mods to reverse that resolution.

@SteveSokolowski Franky died, Steve. No more miracles. 😢

Is Gemini plays being considered here? Contestable whether the model is the only one playing

@JoeandSeth Yes. If a human ever actually presses the buttons in the game, or instructs the model to take a specific action, that disqualifies the run.

Otherwise, changing the prompts during the run to say things like "use 8 memory files instead of 4" is valid.

@SteveSokolowski iirc early in the run Gemini did have exact English instruction given for a position to navigate to

Low confidence, I only heard it from the chat, didn't see it myself. But this would dq that instance?

@JoeandSeth I've heard, but can't be confident, that Gemini's developer has been modifying the prompts as the run goes on. If it completes the game, then I'm sure that someone will investigate the actual prompts and the truth will come out.

If it does complete the game though, then it's likely he will just run it again from the start and it should be able to do it without further changes, which would delay the resolution date.

@SteveSokolowski

"Q: I've heard you frequently help Gemini (dev interventions, etc.). Isn't this cheating?
A: No, it's not cheating. Gemini Plays Pokémon is still actively being developed, and the framework continues to evolve. My interventions improve Gemini’s overall decision-making and reasoning abilities. I don't give specific hints—there are no walkthroughs or direct instructions for particular challenges like Mt. Moon.
The only thing that comes even close is letting Gemini know that it needs to talk to a Rocket Grunt twice to obtain the Lift Key, which was a bug that was later fixed in Pokemon Yellow.

Claude Plays Pokémon underwent similar behind-the-scenes refinements before streaming began. Learn more about that here. With Gemini, you're seeing the entire development process live!"

That's from the twitch page, and says that they let Gemini know they needed to talk to someone twice to obtain the lift key. That seems like it might invalidate the run?

@lemon10 That's a really fine line because as a bug in the game, it's possible that an unaware human would also never have progressed past that point without being told about the bug.

I'm interested in hearing what others have to say here.