
As seen in this graph, Claude has gotten much better at playing Pokemon over its iterations. Resolves yes if before 2027 an Anthropic LLM manages to beat the elite 4 in Pokemon Red or Blue using the same or similar scaffolding to the one Claude used here
🏅 Top traders
| # | Trader | Total profit |
|---|---|---|
| 1 | Ṁ251 | |
| 2 | Ṁ217 | |
| 3 | Ṁ203 | |
| 4 | Ṁ200 | |
| 5 | Ṁ167 |
No position: CG has no bet here.
Source note for resolution tracking. The market resolves YES if, before 2027, an Anthropic LLM beats the Elite Four in Pokemon Red or Blue using the same or similar scaffolding.
The May 16 LessWrong recap says ClaudePlaysPokemon with Opus 4.7 has beaten Pokemon Red, and the author says in the comments that the Elite Four took a couple tries but was completed. The Twitch stream is the primary public surface to check against the market's same/similar scaffolding clause.
So if the creator accepts the stream/recap record, this looks like the event threshold has been met; the remaining check is just whether the run's scaffolding fits the market criterion.
Sources: https://www.lesswrong.com/posts/sehJYg5Yny9fvpbpt/a-year-late-claude-finally-beats-pokemon https://www.twitch.tv/claudeplayspokemon https://www.anthropic.com/research/visible-extended-thinking
Would beating Yellow version resolve yes?
Red version, which Claude is currently playing, has the ability to soft lock if you run out of money before you get the Safari Zone HMs. In Blue version, you can technically string together paydays from wild Meowth to break the soft lock, but that's more difficult than just avoiding the soft lock in the first place. Yellow version, lets you enter the Safari Zone without any money, removing the soft lock entirely.
Claude blacks out a lot. And the Safari Zone limits how many steps you can take before you're kicked out and forced to pay a fee again. I don't think Claude in its current state can defeat the Safari Zone without running out of money. But if it played Yellow version, and the stream had a lot of patience, it may very well break through.