Will GPT-5 be able to GM a session of my custom TTRPG to my satisfaction?

1kṀ2881

resolved Dec 4

Resolved

ALL

I have been working on a "for fun" side project TTRPG to play with my friends. The RPG is rules-light, narrative-driven, and explicitly non-crunchy. Currently, there are only about ~15-20 pages of text.

When GPT-5 releases, I will attempt to have it GM a session for my playgroup after feeding it the most up-to-date PDF of the rules. I will do my best to give GPT-5 every chance to succeed with prompting and provide it with a detailed and thorough setup.

Market Resolution Criteria:

This market resolves YES if:
- The session runs well with no obvious errors or mistakes.
- GPT-5 actually understands and executes the rules as written.
- The session does not feel lame or derivative. Scenes and encounters should not be clearly pulled directly from the rules document. Creativity and uniqueness in the world are essential, and I will prompt for these aspects initially.
- GPT-5 makes use of relevant rules when appropriate. For example, omitting to use the correct combat rules during combat would result in a fail.

This market resolves NO if:
- The above conditions are not met.
- GPT-5 does not release by the market close date.

Details:
- My players will handle all dice rolling, so GPT-5 does not need to handle dice.
- The ruleset is not quite ready for release; I will post it here if it reaches a stage I am happy with.
- The ruleset is a fairly heavily modified variant of a PBTA system, using a standard 2d6 system.
- Currently GPT-4o fails incredibly hard at this task.
- This market resolves by the intent of the question over any specific language in the description. I will update the language and increase clarity and specificity if needed.

GPT-5

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ158
2		Ṁ141
3		Ṁ109
4		Ṁ98
5		Ṁ85

People are also trading

Will I be impressed by GPT-5?

19% chance

Will GPT-5 be more competent than me in my area of expertise?

8% chance

Will GPT-5 have Atari skills?

3% chance

Will GPT-5 destroy the world?

1% chance

What will be true about GPT-5?

Will GPT-5 ace exams?

77% chance

Will GPT-5 be capable of achieving superhuman performance in at least one exam that is typically taken by humans?

91% chance

Will GPT-5 have "the ability ... to autonomously replicate and acquire resources" per an ARC-like eval?

Will GPT-5 be able to solve A::B system puzzles consistently

Sort by:

bought Ṁ250 NO

GPT5 is worse than GPT o3 so far.

I think the right scaffolding could create a reasonably good DM out of current LLMs, but it would be a looooot of work to create and tune it that probably won't happen.

Yesterday I briefly experimented with the new Claude model on a subset of some of these problems. We fed the current rules version into Claude and had it generate some scenes / scenarios in the style of the game. I will say that Claude did noticeably better at this task than GPT-4o!

We then had it dive into one of the generated scenarios as the GM and unfortunately it failed very quickly, but it was still an improvement overall.

As a sidenote the new artifacts feature was very pleasant to work with, and when it would create a scenario it would create an artifact for them which was actually very useful.

NOTE: This market is only about GPT-5 of course, but as new relevant models come out I will try them regardless and give updates.

What's your current format of interaction with it? You give it rules in a prompt and tell it to start GMing a session, then interact with text or voice, rolling dice when told to? Seems like the format of interaction could be awkward enough to be judged lame based on that alone.

GPT-4 can play a GM for a very simple TTRPG, but it tends towards the generic side, and makes occasional mistakes like forgetting simple rules.

I tried giving it a PDF of the rules document, and then giving it a detailed starting prompt.

Everything was done with text not voice, and we rolled dice when instructed to.

> "Seems like the format of interaction could be awkward enough to be judged lame based on that alone."
No, this is not the point of that line. I am referring to lameness in content or GMing ability. For example, in one attempt with GPT-4o it literally used an example scene in the rules document. When I prompted it to not repeat content used in the rules - it modified some of the text to not be literally identical but kept the exact same framing as the example. This was despite me prompting initially for creativity and uniqueness.

I won't be judging GPT-5 poorly because of the text interface limitations!

> "GPT-4 can play a GM for a very simple TTRPG"
Not really, and certainly not well.

Three more examples that are clear failures I have experienced with attempted puzzle rooms:
- At one point GPT-4o created a scene with what was just a placeholder puzzle. There was no actual solution, it was just more of a scene stub than a real fleshed out room.
- I have seen it use well known riddles from things like lord of the rings or other famous riddles. Again despite me specifically prompting for uniqueness.
- I have seen it repeat the same exact puzzle in multiple sequential rooms. For example there was an element puzzle where the player had to match water with a water pillar, fire with fire, etc. (Obviously not novel, but sure it worked for the scene) Immediately in the next dungeon room there was..... another elemental puzzle where the solution was to match water with water, fire with fire, etc.