Will AI be passable at answering Magic: The Gathering rules questions before 2030?

1kṀ6398

2030

88%

chance

ALL

Asking GPT-3 MTG rules questions returns some rather nonsensical answers. For example:

This answer makes no sense, and those cited rules don't even exist.

This was from a prompt where I supplied it with a list of other rules questions and correct answers to them, so it does "know" that it's supposed to be answering coherently and correctly. I can also tell from other experimentation that card text and the Magic Comprehensive Rules document were a part of GPT-3's training data. GPT-3 is clearly not powerful enough to properly understand such a complicated technical system.

This market resolves to YES if, by the beginning of 2030, I have access to a system that can give me correct answers and explanations to Magic rules questions in natural English text. Specifically:

I will supply it with 20 completely random unreleased questions from RulesGuru. (Plus card text if necessary.) Over those 20 questions, it must have at least a 90% success rate on giving the right answer, and at least a 50% success rate on providing an explaination that clearly and correctly explains why it works that way. A correct explanation can leave out a small detail here or there, but it must be good enough to help a human understand the material, and avoid anything blatantly wrong like referencing parts of the rules that are irrelevant or don't exist.

For a harder version of this question, see /IsaacKing/will-ai-be-superhuman-at-mtg-rules

Update 2025-02-21 (PST) (AI summary of creator comment): New Resolution Criteria:
- The resolution criteria have been updated to be stricter than those originally described.
- The detailed, updated criteria can be found at the linked page and replace the previous criteria.

Magic: The Gathering

Get

1,000

to start trading!

People are also trading

Will AI reach human-level performance in Magic: The Gathering before 2026?

5% chance

Will an AI be able to play 3-person Monopoly Deal or an equivalent card game at a superhuman level by the end of 2025?

71% chance

Will Wizards of the Coast change their stance on not allowing AI art before 2026?

12% chance

Will AI be superhuman at MTG rules by the end of 2030?

79% chance

Will an AI be able to play a type of video game that it wasn't trained on before 2026?

16% chance

Will Quora questions be auto-answered by a more sophisticated bot (at level of GPT3.5 or higher) by EOY 2025?

71% chance

Will Magic: The Gathering Arena introduce bots in multiplayer by the end of 2025?

26% chance

Will an AI produce encyclopedia-worthy philosophy by 2026?

19% chance

Will AI beat top Magic the Gathering human player before the end of 2026?

13% chance

Will AI beat top Magic the Gathering human player before the end of 2028?

Sort by:

The resolution criteria for this market are pretty lax, so I've made a stricter one here:

Doesn't seem to be getting better...

@IsaacKing https://chatgpt.com/share/67b8ead7-33c0-8012-a542-ddc300e3233c
I tried 5 questions from RulesGuru.
Can you confirm the evaluation?
1. -
2. Answer +, Explanation -
3. Answer +, Explanation -
4. -
5. Answer +, Explanation +

Completely wrong answer, doesn't even understand the question. It claims that the Emrakul trigger is controlled by Noelle, when the question states it was Aiden who discarded it, thus Aiden controls the trigger. Opening the reasoning trace shows multiple other nonsensical statements, like that Noelle cast Emrakul during Aiden's cleanup step, when the question clearly states that happened last turn. The reasoning trace also directly contradicts the final answer in multiple places, such as stating in the last sentence that Noelle's trigger resolves second, when the final answer states it resolves first.

Seems like it would be a waste of my time to go through the other 4, ChatGPT is obviously not remotely capable of handling even intro level questions. (Which this one is; the average experienced Magic player even with no judge training at all would be able to answer it correctly.)

(Or am I misunderstanding your question?)

@MikhailDoroshenko Ok hold on I think I get what you're asking now. The "-"s in your list indicates a wrong answer overall, and you want me to confirm that it got the other three correct, with a wrong explanation for 2 & 3, and a correct one for 5.

Sorry, you had said below that you think this is already solved, so I thought you were presenting this conversation as ChatGPT succeeding at the desired level.

Let me see...

Answer wrong.
Answer correct, explanation correct.
Answer correct, explanation a little iffy but correct.
Answer wrong, but this is likely because you provided incorrect card text.
Answer correct, explanation correct.

That's not bad, but all of these questions are on the easier end, I think it got lucky.

@IsaacKing Yes, sorry, I should have specified that better. I had a hope that it will succeed, but after I run some tests, I realized that it is not at the necessary level yet, and also realized that it is hard for me to tell when the explanation is correct. Just wanted your confirmation to see how far the models from the specified bar.

bought Ṁ10 YES

2030 is a long way away, this should be higher

@DavidOman I think it is already solved, but I don't know when it will be resolved.

The current state of the art: https://nissa.planeswalkercompanion.com/

Just do the fun ones rather than the site. Humility + Opal, Season + Arbiter, Volrath's Shapeshifter being - well - the card that it is, Panglacial Wurm being the card that it is, whether the Gitrog interaction counts as slowplay (since it's technically not a loop per Toby Elliot's fantastic horsemyths post), etc etc.

I'm not that into predictions that look so far future, but I studied both AI and MTG somewhat, and IMO, this problem's too complex and unnecessary for someone to want to massage into being an accurate rules query engine. (We already have an excellent rules engine, of course, in MTG Arena, but doing both queries and responses in plain text is quite a feat.)

predictedNO

@TylerColeman Arena's rules engine is much simpler than the full Magic rules engine, since it's restricted to only recent cards that have been designed to work in Arena. And even then it's not perfect. For example, finding a legal declaration of blockers is NP-hard, so my understanding is that Arena uses a heuristic algorithm that may not always return perfect results. And there are other bugs here and there.

I plan to spend a few days trying to fine-tune GPT-X to answer rules questions, or train a much smaller dedicated model to do so.

I just realized a problem with this resolution process, which is that the AI system may have access to the internet and simply be able to read the answers off RulesGuru.

If there are no objections, I will change the process to use 20 unreleased questions on RulesGuru instead. (With their wording fixed up so it's clear what's being asked.)

predictedYES

@IsaacKing You could also change the names of the players I think, although that wouldn't slow down a reasonably general intelligence.

predictedYES

@IsaacKing A bigger problem might be that RulesGuru might not exist in 2030.

predictedNO

You could also change the names of the players I think, although that wouldn't slow down a reasonably general intelligence.

Yeah, even GPT-3 can already do that.

A bigger problem might be that RulesGuru might not exist in 2030.

It's my site, so as long as I'm still around and I haven't lost all the files and their multiple backups in some catastrophic accident, it'll be available. (May not still be online, but I'll have the files somewhere.)

Did the above example include any prompt engineering to let the engine know that it's supposed to be impersonating someone who knows something about MTG rather than the most likely idiot on the internet?

predictedNO

Yes, I included several examples of questions answered correctly. In the past I've also tried different prompts, none of which worked significantly better.

Feel free to try it out yourself. Even if you don't know anything about Magic, you can grab questions from RulesGuru and check any rule citations that GPT-3 provides against the rules document here. Even ignoring whether the rest of the answer makes sense, if you can get GPT-3 to cite only rules that exist, that would be a marked improvement. :)