When will a Claude model think faster than its Pokemon character can take steps?
13
10kṀ11k
2029
5%
June 2026 or earlier
10%
January 2027 or earlier
29%
June 2027 or earlier
44%
January 2028 or earlier
53%
June 2028 or earlier
62%
January 2029 or earlier

Currently the way Claude Plays Pokemon works is that Claude executes some command with the emulator, and then after that command completes, it takes a screenshot, and considers its next move. When the command involves moving the character, this entails use of a "navigation" tool that allows Claude to take multiple steps at a time. This is presumably due to how arduous it would be to require Claude to take each step individually. It would produce a lot of extra tokens (filling Claude's context with useless content) with Claude not being able to process more than one screenshot per character step without having to stop and think, slowing down gameplay considerably.

This market resolves YES for all times at or before which a general-purpose, publicly available AI model from Anthropic, either explicitly named Claude, or part of a model series clearly the successor to the Claude series of models, can process frames of a Pokemon game and take actions with an emulator tool, at a faster rate than the character in the Pokemon game can take steps.

This speed requirement includes all networking and processing latency in an implementation of Claude Plays Pokemon, and must be demonstrated in some publicly-visible form, such as a Twitch live stream. It need not be operated by an Anthropic employee, and need not be Pokemon Red, though it should be a Pokemon game that has discrete steps like Pokemon Red does.

Ultimately I will make a judgement call as to what counts, but in the absence of unanticipated edge cases, the following guidelines will apply:

  • Claude must be able to "see" at least one frame per character step

  • Claude must input each step individually as button presses to an emulator tool, as opposed to using a navigation tool as it does presently

  • There must be no need for gaps in between steps - it must be apparent that if Claude wants to, it can move the character continuously by having the next move button held down before the current character step completes.

  • Claude must be able to make active decisions in this time, such that it can make navigation choices whilst the character is still moving, based on what new parts of the map have entered the screen, without having to stop to think.

  • The model must be at least as capable as Claude Opus 4.5 at playing Pokemon - ideally measured in terms of game progress per unit real time in Pokemon Red (at time of writing, Claude Opus 4.5 has six badges after one month of gameplay) but otherwise according to my judgement. This is to prevent the case of a small model playing Pokemon extremely badly in the name of speed, which is not in the spirit of what I'm going for with this market.

It is difficult to anticipate all possible corner cases, but some other guidelines are:

  • In terms of the existing model lineup, a future Opus, Sonnet, or Haiku model would all qualify - it's fine if it's not Anthropic's flagship or most powerful/capable model, but it must be in the same product series as their flagship or most powerful/capable model.

  • Some specialised videogame-playing model would not count - it should be a general-purpose model Anthropic markets as being the right tool for various jobs such as software development.

Sufficient ambiguity may cause one or all dates to resolve NA.

I may add new dates with finer resolution if this starts looking more likely to happen - the format of the market is such that new dates can be added without changing the probabilities of existing answers.

All dates refer to US pacific time.

Market context
Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy