
Background
Claude 3.7 Sonnet is currently the only LLM reported to have made significant progress in playing Pokémon Red. It has successfully defeated several Gym Leaders and progressed through multiple areas of the game using its "extended thinking" mode. Other LLMs like GPT-4V, Gemini, and Llava have been tested but struggled with the spatial reasoning and navigation required to play effectively.
The technical challenges of playing Pokémon Red include maintaining game state awareness, planning multi-step sequences, and navigating the game world effectively.
Resolution Criteria
This market resolves to YES if:
Any LLM other than Claude completes Pokémon Red by defeating the Elite Four and the Champion before any Claude model does so.
This market resolves to NO if:
Any Claude model (including future versions) completes Pokémon Red by defeating the Elite Four and the Champion first.
No LLM completes Pokémon Red by the market close date.
For resolution purposes:
"Beating Pokémon Red" means completing the main storyline by defeating the Elite Four and the Champion.
The LLM must play autonomously without human assistance beyond initial prompting, scaffolding (tool-use allowed) and setup.
The achievement must be verifiable through credible documentation (video evidence, technical paper, or announcement from a reputable organization).
Considerations
The race to beat Pokémon Red represents a significant AI capability benchmark, as it requires complex reasoning, memory, and planning abilities. While Claude currently has a head start, the field of AI is advancing rapidly, and competitors may develop specialized capabilities to tackle this challenge. Future LLM releases from organizations like OpenAI, DeepSeek, XAI, Google DeepMind or others could potentially surpass Claude's current capabilities in game-playing tasks.