Will AI beat top Magic the Gathering human player before the end of 2026?
57
1kṀ38k
2027
13%
chance
3

Resolves to Yes if an AI has the strength to beat world champions in Magic the Gathering before the end of 2026

If AI were able to beat the world champion in any one of the major format, Draft, Constructed or Sealed, it would count as a resolve Yes. The AI doesnt have to actually be participating in the world championship (it likely would not), but as long as the AI demonstrated consistent strength in beating current world champion in repeated games (70% win rate with 10+ games played) that would count as a Yes.

AI doesn't necessarily have to beat the current world champion. Beating any of the top 20 in the world within past 3 years is sufficient. The goal is to demonstrate that the AI has the strength to likely win against the world champion.

  • Update 2025-02-12 (PST) (AI summary of creator comment): Format Clarification:

    • The match must be played in an official format of Magic: The Gathering.

    • This includes the originally mentioned formats (Draft, Constructed, or Sealed) as well as any other format officially recognized.

Get
Ṁ1,000
to start trading!
Sort by:

I don't think that you can win 70 percent of games in MTG against the world elite even if you play perfectly. The game is too random.

@HannesLynchburg with Chess and Go, the best AI can win the world elite 99% of the time.

Randomness is not a concern, it's just a skill issue

@AmmonLam Hmm, that's not always the case. You can't win in Powerball consistently no matter how superhuman you are at it. It would also be extremely hard to win at a coin toss where both parties made sure that coin behavior was as close to random as possible.

@MikhailDoroshenko but MTG is not powerball. And top players do consistently get top finishes. https://magic.gg/events/lifetime-top-finishes-by-player

filled a Ṁ20 YES at 23% order

Ross Ulbricht got freed I take as strong sign that this too will happen

filled a Ṁ50 YES at 30% order

Why is this so low? Computers have beaten humans at every other game, why not this one?

filled a Ṁ2,500 NO at 10% order

@DavidOman what matters is not just "how hard would it be in theory to teach an AI to play this game", but in practice, how much effort people actually put into doing so. AIs have beaten humans at every other game after people have put a ton of effort into making that happen. I don't see any reason to expect that will happen in MTG? Just like it hasn't happened in countless other games. and it helps that MTG is, as games go, particularly illegible. It's not like that would be hard to get past with enough effort (it wasn't a barrier for RTS games, but again, that was with an enormous amount of effort involved, and the fact that AI could learn how to play starcraft does not mean I could personally take an AI and get it to play WC3).

I think the case for YES is simply something where AI capabilities grow so rapidly that it's trivial to have them develop & teach novel capabilities for themselves without so much effort. That's totally possible, but a low % on this market seems consistent with other markets on that front.

@Ziddletwix the other case for YES is simply that "70% win rate with 10+ games played" is (a) in some sense, a crazy high bar, if you take that to mean an actual sustained 70% true win rate (in a game with such high variance), but (b) OTOH, if you take it very literally to just mean "can win 7 out of 10 games", that's a comparatively low bar, if you get an AI playing at an elite level and give it enough shots. But I think ammon's interpretation of the question would be closer to "sustained 70% win rate, not just spiking a random 7 out of 10 games", and that's back to being a very high bar (again, zero reason to doubt AI couldn't do it, but like... why would they?)

@Ziddletwix @AmmonLam I believe any series of 10 games against a top MTG player in any of the allowed formats that result in a 7-3 win should count as a yes resolution. Please confirm if this is true.

@MikhailDoroshenko yes, any official format

@AmmonLam It specifies "as long as the AI demonstrated consistent strength in beating current world champion in repeated games (70% win rate with 10+ games played)..." and consistent strength in beating in repeated games is absolutely not the same as spiking a single individual series, right?

"The goal is to demonstrate that the AI has the strength to likely win against the world champion."

No world champion level Magic player would say that a player winning a single individual series 7-3 demonstrates that the player has the strength to likely win against that opponent.

@PatrickChapin my reading of the question was basically "demonstrated consistent strength in beating current world champion" as the general qualitative threshold, and "70% win rate with 10+ games played" as a more narrow absolute minimum criteria (e.g. they had to play it out & get this 7/10 result at a minimum, we can't just believe in theory that's the case). because yes otherwise spiking 7 out of 10 games tells you ~nothing about beating the top MTG human player, it's entirely consistent with "knows how to play the game competently and got a little bit lucky".

(ultimately up to Ammon here, although personally I would listen to patrick chapin on matters relating to mtg!)

@Ziddletwix We will not have an infinite number of samples to spike from before the end of 2026. How many AI-MTG pro games have been played so far? ~0? Do you expect this number to reach millions in two years?

@MikhailDoroshenko My rough guess is that, in total, we have around ~100k games of this level per year if you combine all formats. If we narrow it down to a specific format, this number would be even smaller. A single top-level player playing a Vintage against AI and losing 5-14 before the end of 2026 should definitely be enough to resolve true, no matter how stomped AI in Modern for example (because those might be all the games AI had in this format before 2026).

@MikhailDoroshenko If the overall total win rate was demonstrated to be 30% that would be quite different than a single 10 game set. So if the AI played ten 10-game sets, cherry picking the one with a 7-3 record would not qualify right? And along these lines, what if we made 100 versions of this AI that all play low sample size and only keep the winning record?

I think the fundamental question comes down to whether the intent is speaking to whether we'd expect the AI to likely win (ie having an expected win percentage of 70% against the best human players in the world, implying super human performance), or whether we'd expect it to be possible that any individual AI could ever have a winning run.

Suppose we were interested in whether an AI would demonstrate the strength to likely win against the best players in the world in any of types of poker. Let's say the AI's best event turns out to be Texas Hold 'em. If the goal was to demonstrate that the AI has the strength to likely win against the world champion at Texas Hold 'em, we would obviously be talking about whether the AI is a stronger player, if we would expect it "likely" to win against world champion level play.

If someone kept building different models and having them enter events without the human opponents knowing, and then discarded the results for each loss, "tweaked the model" and then entered in secret again until they have a run where they are ahead, and then stop, we would surely not describe that model as likely to win against world champion level play.

Winning at a sustained level of 70% against world champion play would indeed be super human and would imply an expectation of likely winning. If there were a publicized event with real incentive for the humans to win where a single model was playing 10 game sets against 5 world champions and it finished with a 70% win rate, there's at least an argument to be made that the model is likely to win against world champion level play.

Both of these scenarios lead to a forward looking expectation that a prediction market would favor the AI in that event. Any scenario where the prediction market would still bet on the world champion level humans in that event moving forward would seem to not meet the requirements for demonstrating that the AI is "likely to win", right?

@PatrickChapin I agree that a long-term 70% win rate across many games would be a stronger claim, but that’s not the resolution criteria. AI just needs to hit a 7-3 result in a single set, not prove it’s the best player across all formats. Even in a high-variance game like MTG, that’s already a strong signal of top play. Since AI is unlikely to play thousands of matches before 2026, a single event where it wins 7-3 should be enough to resolve YES

@AmmonLam if AI has 200 matches against top players in various formats combining to a record of 100-100, but in that sample exists a run (in one of the formats) where it spikes a 7-3 result and then refuses to participate in more games. How would this resolve? To me this reads as something that doesn't qualify to consistently beating the pros with 70% win rate.

@ChameLeon I object to the fact that you sum games across different formats. The question clearly states that each format has its own resolution trigger.

@MikhailDoroshenko as in draft constructed or sealed?

E: however the latter question still persists. How is it determined how long a series of matches will go on?

@MikhailDoroshenko "The goal is to demonstrate that the AI has the strength to likely win against the world champion." For sure the market specifies it doesn't have to be the best across all formats; but to be favored enough to likely win against world champion opposition, it has to be the best (or at least super human) in one, right?

@PatrickChapin Sure, if an AI played 200 games in one format and lost every game except for 7 wins in a row, I’d agree with you. But that’s not a realistic scenario.

My main issue is that your proposed resolution criteria (70% over a long time) do not match what’s actually written. The question explicitly states that a 7/10 result is sufficient for YES resolution.

Are you arguing that the threshold should be raised to 70/100 or higher? Because that’s not what’s written in the question. If the intent was to require long-term consistency, the question should have been worded differently.

@MikhailDoroshenko My interpretation of the question was that the AI needs to demonstrate "consistent strength in beating world champions" and the parenthetical was clarifying how much consistent strength is needed to be demonstrated. An interpretation where a model plays 10 games, and if it doesn't win 7-3, you "tweak the weights" and play 10 more games, repeating until it's wins a single set 7-3, would not meet any expert's criteria for "demonstrating consistent strength in beating world champions" nor "demonstrating that the AI has the strength to likely win against the world champion," whereas demonstrating a consistent expected win rate of 70% in 10+ game sets would.

@PatrickChapin Be specific. Which exact criteria do you have in mind that will trigger the resolution of the question? Give me an example of the world and the precise number of games played where you think this question should resolve as True.

@MikhailDoroshenko An AI could demonstrate consistent strength at beating world champion level players at a 70% win rate in a variety of ways, of which I listed two above. For an AI to have an expected win rate of 70%+ against world champion play, we need a sufficient sample size of world champion play (and of course, evidence of bad faith would disqualify).


One example was a high enough win rate over a large sample size. If an AI won 70/100 games against world champion players (not known to be playing in bad faith), most experts would agree that the AI is expected to win (in that format, under those conditions). If it was easier to come by 100 game sets against world champion level players, this threshold might need to be higher to account for "rerolling", however, the bottleneck is tight enough here, I'd think 100 games reasonably sufficient.

When you remove the ability to "reroll" bad results, the threshold lowers. The example given above was a single publicized event for stakes where an AI faces 5 world champions in 10 game sets. If the AI were to win 35+ games, I would think it reasonably meets the criteria of demonstrating enough consistent strength to be expected to win 70% against world champion play.


An example of why these thresholds: When preparing for Pro Tour Kyoto 2009 with Gabriel Nassif and Mark Herberholz, among others, my position was that 5cControl has a positive win rate against Faeries; while their assumption was the opposite. We agreed on a 50-game set as a large enough sample size for the parties involved to have at least 50% confidence (with a threshold at which loser would flip and play winner's deck). I beat Nassif with 5cControl beyond the agreed upon threshold, so he switched off Faeries and won the Pro Tour with 5cControl.

This isn't the only example, but for decades, I've been a part of a lot of world class teams preparing for events, and 50 matches was a lower-bound floor used many times for having more confidence than not in a situation with no rerolls. Whereas I've never encountered a world class player that would say a single 7-3 result demonstrates consistent strength, let alone a scenario by which someone tries numerous 10 game sets and succeeds once.

@PatrickChapin With current models, you can probably play a million games and never lose, so a single 7-3 set against a world‐class player would already be a big milestone—even if it doesn’t prove unstoppable superhuman consistency. It’s still a major leap from “loses to anyone” to “can reliably beat a pro at least once.” That’s why the market explicitly says a 7-3 result in any single set is enough, without requiring huge sample sizes or indefinite consistency. If we wanted a stricter standard for “elite” play, we’d need a totally different resolution criterion. (IMO)

@MikhailDoroshenko I will probably withdraw myself from the conversation from this point. I think I already said everything I had in mind. I will accept any decision by maker creator.

@Ziddletwix I agree with you point about the level of effort involved. But it's also important to note that AI actually didn't quite succeed at beating the best human players in the world at StarCraft.

When it was able to control units at superhuman speed, the AI won by using strategies that humans are physically incapable of replicating. When it was limited to a human rate of speed, it played at grandmaster level, but it still lost against the top human players.

That's why their paper simply claims to perform better than 99.8% of human players, not 100%: Grandmaster level in StarCraft II using multi-agent reinforcement learning | Nature

I don't watch StarCraft anymore these days, but I don't think there has been any significant improvement since then.

@MikhailDoroshenko I'd just add that already AIs can play and win at MtG Arena. There is a bot mode where you play against an AI and the AI certainly wins a reasonable percentage against most humans, and can easily beat a pro at least once. They are not likely to consistently win against world champion level play, but a single 7-3 set against a world-class player would not be a big milestone at all imho.

Only needing to succeed with a single strategy in a single time period in a single format simplifies the problem into a very narrow task, and Magic is a game with a number of decks and strategies from time to time with the capability of reducing the game into more coin-flippy of dynamics. We would never say a player winning a single 7-3 set with a burn deck has demonstrated enough consistent strength likely to win against world champion level play. Third parties have developed bots that can play certain decks on auto-pilot, like the aforementioned burn strategy.

And as an additional note, I helped design and develop the AI for the digital card game Eternal, which strong enough at release 7-8 years ago that we had to tone it down for the average player. I say this to say that the notion of an AI losing a million games in a row hasn't been true for a decade.

If the intent is truly whether an AI is capable of pulling off a 7-3 set, that was true years ago and wouldn't make for an interesting market. An AI demonstrating enough consistent strength to make it likely to win against world champion level play at a 70% expected win rate, even with a single strategy in a single time period of a single format, would represent an actual advancement.

@PatrickChapin Can you give me a replay of a single game where you lost against a bot?

@MikhailDoroshenko Or not necessarily you but player of the caliber described by this question?

@MikhailDoroshenko apologies if I’m missing the joke, but FWIW, Patrick Chapin is very personally familiar with what it’s like to be a player of the caliber described by this question!

@Ziddletwix Maybe my wording was unclear. I just meant that any replay from a top-level player would be fine, not necessarily from him specifically. I obviously included him as part of top level.

@MikhailDoroshenko If such replay exist I will agree that 7/10 is not interesting and support increase to 14/20.

© Manifold Markets, Inc.TermsPrivacy