Will an AI be able to beat me in Pokemon?

MANIFOLD

Ṁ1kṀ2.9k

Aug 5

24%

chance

ALL

Challenge me to battle an AI of your choice in one of the following formats:

Gen 6 Balanced Hackmons
Gen 7 Pure Hackmons

Gen 8 National Dex Anything Goes
Gen 9 OU
Gen 9 National Dex OU
Gen 9 National Dex Ubers
Gen 9 Balanced Hackmons

Teams must be submitted blind. We can discuss how to do this, but one easy way is to agree to correspond (via Discord, for example) at a certain time, confirming the plan at the agreed-upon time, and then exchanging teams within a few seconds/minutes of each other.

All battles will be conducted on Pokemon Showdown.

I'm open to different kinds of AI:

If an LLM, I can correspond with the LLM (in which case it either needs to be freely accessible or you need to allow me to temporarily sign into your paid account) or you can (in which case we will need to set up a predetermined time to conduct the battle, during which we both must remain online and correspond via messages). In either case, the conversation with the LLM must be shared afterward. Messages can only provide information about the state of the battle, and each message must include a complete list of all available options (e.g. 4 moves and 1-5 available switches). Messages must not influence the LLM's decision toward any particular option.
If an RL system, it is up to you whether you want to train on my specific team (in which case you can take as much time as you want; then afterward the battle will be conducted at my earliest convenience) or without training on my specific team (in which case the battle can be conducted immediately upon exchanging teams, so I do not have time to prepare). In either case, I need to be able to run the agent so that I can verify the outputs.
If a GOFAI system, you can run the AI and provide the outputs, but in the event that you win, you will need to immediately send the code in order for me to verify that the behaviour was predetermined.
I'm willing to discuss other AI solutions to determine how best to proceed.

Market format and rules shamelessly stolen from here:

"Any Manifold user may challenge me to a match. Any given user may make one attempt if they hold 2000 YES shares in this market. The requirement doubles for every attempt after. So they need to hold 2000 shares before making their first attempt, 6000 total shares if they want to make a second attempt, 14000 shares if they want to make a third [attempt], etc. When you are placing a bet, your 'Max payout' is how many shares you are buying.

"[I am] obligated to accept any challenge from someone with enough shares. Also, [I] must be willing to buy NO shares at [80]% if [I] have the mana to spare to allow challengers to take their shot."

Resolves YES if a challenger beats me in a Bo1 match.

Otherwise, resolves NO at market close time.

Market context

Technical AI Timelines

Get

1,000

to start trading!

People are also trading

In 2028, will an AI be able to play randomly selected computer games at human level without getting to practice?

47% chance

Will an AI be able to beat ascension 0 of Slay the Spire 2, when it's released?

14% chance

Will AI beat top Magic the Gathering human player before the end of 2026?

7% chance

Will Claude beat Pokemon before my 4-year old?

91% chance

Will AI beat top Magic the Gathering human player before the end of 2028?

29% chance

Before 2028, will I be able to train a Super Smash Bros. Melee ML agent that beats the Phillip AI in a Bo5 set?

45% chance

Will an AI by OpenAI beat a super grandmaster playing chess by 2028?

57% chance

Before 2028, will I be able to train a Super Smash Bros. Melee ML agent that beats me in a Bo5 set?

55% chance

What is the next major competitive sport where AI beat top human player

When will an AI figure out how to beat Factorio?

4 Comments

11 Holders

179 Trades

Sort by:

Going to see if GPT-5 is capable of beating me later today.

@NBAP GPT-5 was not bad. Generally picked pretty sensible moves / switches, which impressed me. Probably not better than a 1000 Elo player (which is pretty bad by human standards), but probably good enough that it would win a match eventually due to luck or carelessness.

For anyone who would like to trigger a YES resolution, I suspect that it might be possible for any decent programmer to use poke-env to train an RL agent to play a specific matchup (i.e. one specific team versus another specific team) at a superhuman level. I tried to do this myself but it was beyond my ability, and current LLMs couldn't get me there. Maybe GPT-5, we'll see.

If simple RL doesn't work, I suspect a relatively barebones implementation of Alphastar-League training (basically a league of historical agents and exploiter agents) would work rather well with Pokemon.

I strongly believe that the problem is tractable, but regrettably, I don't think I'll be the one to solve it. If anyone thinks they've done it, feel free to comment here and I'll pump the market down before your attempt so you can make some good profit if successful.

Made a small change to the description. I copied the description from another market which referenced a Bo5 match, but I realized how tedious this would be with an LLM, so I've changed this market to Bo1. If this influences your position, please update now, and I'm happy to reimburse any resulting losses.