Will GPT-4 be able to answer rules questions about Magic: The Gathering well enough to be useful?

690Ṁ1806

resolved Apr 5

Resolved

ALL

It does not have to be as good as I described here. It just needs to be good enough that I judge it at least somewhat useful for some task. This could include:

Answering simple questions to help people learn how to play.
Provide a mostly-correct answer to a hard question, that a human can then edit to be completely correct.
Rephrasing a rule in different terms that are easier to understand.
Helping people find a rule they're looking for, by describing the rule and GPT-4 gives them the number.
Doing any of the above correctly enough of the time that it's better for the user to try GPT-4 and then double-check an answer that seems likely to be wrong, rather than not using it at all.

ChatGPT abysmally fails all of these.

GPT-4 speculation

Magic: The Gathering

Barcalona

New Year's Resolutions 2024

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ207
2		Ṁ196
3		Ṁ177
4		Ṁ114
5		Ṁ49

People are also trading

Will GPT-5 be able to GM a session of my custom TTRPG to my satisfaction?

17% chance

Will GPT-5 have Atari skills?

3% chance

Will AI reach human-level performance in Magic: The Gathering before 2026?

5% chance

Will GPT-5 ace exams?

77% chance

Will AI be passable at answering Magic: The Gathering rules questions before 2030?

88% chance

Will GPT-4 escape?

5% chance

Will AI be superhuman at MTG rules by the end of 2030?

79% chance

Will GPT-5 be capable of achieving superhuman performance in at least one exam that is typically taken by humans?

91% chance

Will GPT-5 be able to solve A::B system puzzles consistently

15% chance

Will LLMs such as GPT-4 be seen as at most just a part of the solution to AGI? (Gary Marcus GPT-4 prediction #7)

Sort by:

Sorry for the delay. Some other people made https://nissa.planeswalkercompanion.com/, and despite having a bunch of supporting code to identify the relevant cards and rules, it's still complete garbage. I've tested it on 10+ queries and it's yet to get a single one right. Not sure why my testing with Mira seemed more promising, but I think that was probably just luck.

I realize this question was a little awkward, since it's always possible there's something it is useful for that I just haven't considered. But I can never rule that out, so NO seems correct given that I have yet to find one. I might try again with my own support framework for a different task, and I'll re-resolve this if necessary. But I doubt it will work.

I've had success with asking it for specific CR citations

predictedNO

I played around with it a bit a few days ago (thanks to @Mira) and it looks like it's probably helpful for applications that can deal with a high error rate. I'll need to test it some more to be sure.

@IsaacKing under bullet 4 (describing a rule and giving the rule number), would a full recitation of the entire MTG rulebook qualify as giving you the answer? I love where you're going with every other bullet but this one feels like the most "potentially cop-out-y."

Put another way - would you be open to "pre-registering your hypotheses" so to speak by writing down 10 questions now that you will ask GPT-4, and going with >70% as success, or some such?

would a full recitation of the entire MTG rulebook qualify as giving you the answer?

Not any more than asking a researcher to tell me what laws apply to jaywalking and them saying "the law school library is over there" qualifies as them being helpful.

would you be open to "pre-registering your hypotheses" so to speak by writing down 10 questions now that you will ask GPT-4, and going with >70% as success, or some such?

Hmm, I think that would be reasonable. But that would likely favor a NO resolution, because it means if there's some quirk of GPT-4 that makes it only respond well to a certain type of prompt phrasing, I wouldn't be able to take that into account.

(Also, if GPT-4 has access to the internet, I wouldn't want it finding the answers from this market.)

@IsaacKing Good points all. That’s cool - I think I’m good enough based on this answer to not be worried about any technicalities.

@MattCWilson As always, I'll discuss my intended resolution and the questions I asked with traders before I resolve the market.

@IsaacKing That's a 'quirk' of every large language model, and for that matter, most humans. People understand things less when they are written in a more confusing way.