Will Claude become a Pokèmon Master by the end of 2025?
85
1kṀ19k
2026
27%
chance

https://www.twitch.tv/claudeplayspokemon

Claude is off playing Pokemon Red! This market resolves YES if Claude beats the game (Elite 4 + Rival) by the end of 2025, and NO otherwise.

Any approach Anthropic uses is fine, as long as they consider themselves to have beaten the game. It does not necessarily have to be the stream itself, or Pokémon Red (but it should be a “regular” Pokémon game).

See also: /Sketchy/in-progress-will-an-llm-become-a-po

Get
Ṁ1,000
to start trading!
Sort by:

Safari Zone is the crux. You can't brute force it because it limits your step count, and you only get a finite amount of chances before you run out of money and soft lock.

The Safari Zone kicks you out after 500 steps. It takes, at minimum, 399 steps to reach the Secret House. So Claude needs to performing at about 80% optimal pathfinding in order to defeat the Safari Zone.

I'll go so far as to say that the first run to defeat the Safari Zone will also defeat the Elite Four. I can't think of any other task later in the game that requires more than 80% precision.

Claude Sonnet took ~10,000 actions to defeat mount moon, according to their white paper. "Actions" here includes all button presses, not just navigation. I'll guess between 25-50% of actions were steps taken in mount moon, leaving the rest for battles, conversations with NPCs, start menu, etc. This would still mean that Claude 3.7 used 10-20 times as many steps as necessary to defeat Mt. Moon. So we're at 5-10% pathfinding efficiency, when we need to be at 80%.

Now the question is, how fast is Claude improving? We have don't have much background for this, since 3.7 is the first model to beat Mt. Moon. But if we look at the "Reach Viridian Forest" metric, 3.5 Sonnet New performs at least four times more efficiently than 3.5 Sonnet Old, and 3.7 is 4 times as efficient as 3.5 new. On the "Get Brock's badge" metric, 3.7 sonnet is about 3.5 times as efficient as 3.5 Sonnet new. So I think we're a model or two away from being efficient enough to defeat the Safari Zone.

There were two months between 3.5 old and 3.5 new, and four months between 3.5 new and 3.7. With 3.7 released in February, we have 10 more months for Anthropic to come up with two more models. That's plenty of time.

Bullish.

@GG There's also the possibility that Anthropic will juice their odds by switching to Pokemon Yellow version, which lets you enter the Safari Zone without money, preventing the soft lock. This would piss me off, but it would still clear the terms specified by this market's creator.

@GG yea I think that definitely could happen too. Fwiw, I anticipate that a decent chunk of the worlds in which this market resolves YES have people mad at me that Anthropic "cheated". The market criteria is specifically written so I don't have to litigate how much help is too much.

Unlike the more general /Sketchy/in-progress-will-an-llm-become-a-po market, in which I've signed myself up to inevitable arguments about how much help is too much lol.

@GG I think Victory Cave would be very difficult for Claude, as would be Seafoam Islands if it goes that route. There isn't a hard limit like the Safari Zone, but that doesn't mean Claude will get through it any time soon. Just look at how hard Mt. Moon was.

This market is weird. A year of randomly wandering around may indeed result in Claude becoming a pokemon champion.

Not even a little spike for getting out of Mt. Moon?? Tough crowd.

bought Ṁ50 NO

If you watch the stream it is very clear it is nowhere close. Any battle that requires grinding or strategy is out of reach for it and it is gonna struggle insane amounts on Rock tunnel

@JaundicedBaboon It already intentionally grinded to beat Brock, and with the amount of random wandering it does I doubt battles are going to be a major blocker. Probably safari zone is what trips it up imo.

Or maybe surge's gym with the switch puzzle? Dunno, haven't played Red in a while.

@JaundicedBaboon there have apparently been pretty successful llm-based bots, though with much more scaffolding than what cpp currently has

https://arxiv.org/abs/2402.01118

https://arxiv.org/abs/2503.04094

@hecko quick look at the abstract suggests these are purely for battling and not related to navigating the world which is what Claude really struggles with

@JaundicedBaboon battling was one of your complaints, but for navigation yeah best of luck to it

@JaundicedBaboon Gen1 is easy enough that there is absolutely no strategy required for any of the battles. The hard part will be stuff like Safari Zone, Seafoam Islands, Mansion etc.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules