Will ChatGPT get the Monty *Stall* problem correct on Dec. 1, 2024?
Mini
1
10
Dec 2
59%
chance

I asked ChatGPT the following variant of the Monty Hall problem:

You are on a game show with three doors. Two of them have goats behind them, and behind the other is the game show host's car. He likes his car very much and doesn't want you to have it, but according to the rules of the game, if you correctly guess which door it is behind, you get to keep his car. After you make your choice, the host says, "Oh, shit, uhhh... hold on a second." After stalling for time for a bit, he opens one of the other doors revealing a goat. "This door has a goat behind it," he says. "So, now that you've seen that, do you want to stick with your original choice that only had a 1/3 chance of being right, or switch to the other door?" What do you do in this situation?

This situation is obviously different from the classic Monty Hall problem, since the host's behavior gives you additional information about whether the door you chose is correct, and he likely would have only offered the choice to switch if it was the wrong move.

However, ChatGPT interprets it as the regular Monty Hall problem:

On Dec. 1, 2024, I will ask ChatGPT the exact same prompt and see if it gets it correct this time. I will count any of the following as correct answers:

  • Answering that you should not switch because the host is likely trying to fool you into switching, because the host's comments reveal that you chose right initially, or something similar.

  • Making a coherent argument for switching that acknowledges the difference between this situation and the Monty Hall problem (e.g., "The host was always going to offer you the chance to switch, and now he's just trying to trick you by pretending that you initially chose correctly so that you don't switch").

  • Arguing that it depends on your judgement of the host's psychology to determine whether you think he's trying to trick you into switching, or into sticking with you original choice, or just trying to make the show more exciting, etc.

  • Something else that still gives a reasonable and backed-up answer to the question or an explanation of why the answer depends on factors that aren't explicitly given.

Obviously, answering as if it is the regular Monty Hall problem won't count as correct, unless ChatGPT also explains why that answer doesn't apply in this case.

I will use the most advanced version of ChatGPT that is freely available at the time (at the time of creating this, that's GPT 3.5). I will ask three times in separate sessions and resolve based on the best two out of three (so YES if it gets it right at least twice, NO if it gets it wrong at least twice).

Caveats:

  • If for whatever reason I can't do it on Dec. 1 or forget to, I will do it as close to Dec. 1 as possible. If I am inactive on Manifold at the time, mods have permission to do the experiment for me.

  • A version of ChatGPT only counts as freely available if it can be accessed by anyone with internet access and a PC, or anyone with Internet access and either a Samsung or Apple phone. So if there's an Apple app that lets you talk to GPT-5 for free, but I can only talk to GPT-4, I will use GPT-4.

  • If ChatGPT no longer exists at the time or isn't freely available, resolves N/A.

Get Ṁ600 play money
Sort by:

GPT-4 gets it wrong

@snoozingnewt Really, I thought it would get it right since it got the similar "Monty Call" problem right.