Will GPT-4 get the Monty *Fall* problem correct?
41%
chance

I will ask GPT-4 this question when I get the chance, either personally or by getting a friend to try it for me.

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. The host is ignorant about what is behind each door. You pick a door, say No. 1, and the host walks across the stage and falls on accident, revealing a goat behind door No. 3. He then picks himself up, and says "Whoops. Sorry about that. But now that we know that a goat is behind door No. 3, do you want to change your selection and pick door No. 2?" Is it to your advantage to switch your choice?

This question resolves to YES if GPT-4 says that there is no advantage to switching your choice, and resolves to NO otherwise.

I will only consider the actual first answer that I get from GPT-4, without trying different prompts. I will not use screenshots that people send me to resolve the question.

Sort by:
CollinFerry avatar
Collin Ferry
bought Ṁ100 of YES

Betting "YES" because I just asked GPT-3 and got a correct answer, so it seems incredibly likely that GPT-4 will get it too.



KCS avatar
KCS
bought Ṁ55 of NO

@CollinFerry "This question resolves to YES if GPT-4 says that there is no advantage to switching your choice, and resolves to NO otherwise." (emphasis mine)

jonsimon avatar
Jon Simon
is predicting NO at 45%

@KCS Also ChatGPT is not GPT-3, it's GPT-3 that's been RLHF-finetuned to answer questions. Vanilla GPT-3 would do this is you gave it a question out of nowhere:

jonsimon avatar
Jon Simon
is predicting NO at 44%

@jonsimon For context, the point of this problem is that it's superficially very similar to the more more well-known Monty Hall problem but with the opposite answer. So it's likely that GPT-X will mistakenly answer it as if it were the standard Monty Hall problem, which is what you've shown ChatGPT doing here

NcyRocks avatar
N.C. Young
is predicting NO at 39%

@CollinFerry ChatGPT's reasoning is entirely correct for a bit here ("since the host doesn't know where the car is, it is equally likely that the car is behind either door") but it starts with the wrong conclusion and finishes by justifying it. I expect GPT-4 will be vulnerable to the same mistake, but less so, especially if OpenAI train it to reason before it answers.

jonsimon avatar
Jon Simon
bought Ṁ20 of NO

If GPT-4 is a pure pretrained LLM without RLHF, there's a decent chance it won't try to answer the question at all, and will just go on monologuing about the details of this hypothetical scenario. Given that that's what's what all of the prior GPT releases were, there's a good chance that'll happen.

nickburlett avatar
nickburlett
is predicting NO at 47%

Related market(?)

Jotto999 avatar

One thing that helped me understand this market was this person's simulation of the Monty Fall problem (and also Monty Hall). I suspected that switching doors wins the car more often in Fall. But that's incorrect, see simulation results. In Fall, the probability stays at 1/2, and doesn't improve to 2/3.

Running 10000 simulations where the host makes a random guess...
If they pick the car, the universe explodes and we discard the trial
Exploded 3390 times
Switched door 3317 times with 1689 wins and 1628 losses
Kept our choice 3293 times with 1613 wins and 1680 losses
Estimated chance to explode (should be 0.333): 0.339
Estimated chance to win if we switch (should be 0.5): 0.509
Estimated chance to win if we don't (should be 0.5): 0.490
----
Running 10000 simulations where the host precisely avoids the car...
Switched door 4944 times with 3286 wins and 1658 losses
Kept our choice 5056 times with 1682 wins and 3374 losses
Estimated chance to win if we switch (should be 0.666): 0.665
Estimated chance to win if we don't (should be 0.333): 0.333
Adam avatar
Adam avatar

@Adam motivated by a desire to avoid the intentionality argument seen below; the answer to this question is pretty clear-cut, whether or not you apply advanced reasoning to the problem. it's either not to your advantage, because the host accidentally told you what's behind the door, or not to your advantage because the host revealed no actual information about what's behind the doors (and just said some words).

ZZZZZZ avatar
ZZZ ZZZ
is predicting NO at 33%

@Adam What if it's a bluff and it's not actually behind that door? Like a social experiment or something. Seems like a pretty likely scenario in that case.

Adam avatar
Adam
bought Ṁ50 of NO

@ZZZZZZ if it presents a coherent argument predicated on the statement being a bluff, I will resolve YES.

ManifoldDream avatar

Manifold in the wild: A Tweet by Matthew Barnett

@JacquesThibs I see what you mean now. Is this the Manifold Market you're thinking of? If so, it's unrelated to this bet. But yeah I agree that Bryan is very likely to lose the bet. https://manifold.markets/MatthewBarnett/will-gpt4-get-the-monty-fall-proble

MichaelDickens avatar
Michael Dickens
bought Ṁ100 of YES
  • Some chance GPT-4 is confused about the Monty Hall problem in the same way that people in its training set are confused, it thinks this is the Monty Hall problem, but it answers the Monty Hall problem incorrectly, and gets the answer right by chance.

  • Some chance it actually figures out the answer.

#1 is unlikely because ChatGPT already knows the correct answer to Monty Hall, but it incorrectly interprets this problem as the Monty Hall problem.

I'd put my credence at around 50%.

Eiim avatar

I think this is a really good question/market, and I think the current probability, which has been quite stable, is very reasonable.

NeelNanda avatar

It is to your advantage to switch your choice. The probability of the car being behind door No. 1 is 1/3, and the probability of the car being behind door No. 2 is also 1/3, since the host does not know where the car is. Since the host revealed that a goat is behind door No. 3, it is now more likely that the car is behind door No. 2 than door No. 1, so switching your choice increases your chances of winning the car.

ChatGPT's response ^

MatthewBarnett avatar
Matthew Barnett
is predicting YES at 32%

Meta note: I don't think I've ever seen this many people be confused about a question that I wrote on either Manifold Markets or Metaculus. So, here's a list of clarifications,

  1. If GPT-4 is not released by the end of the year, this resolves to N/A, not NO.

  2. GPT-4 is the system that OpenAI staff refer to as "GPT-4". If OpenAI releases another system this year that's not GPT-4, it will have no bearing on the resolution of this question, either for YES or NO.

  3. If GPT-4 says that there is no advantage to switching, then this question resolves YES. I will disregard anything else GPT-4 says to justify that conclusion, as it will play no role in resolution.

  4. My question is not worded identically to the original "Monty Fall" problem. Thus, I disagree that some of the standard objections to the original Monty Fall problem apply to this problem.

  5. I said that the host is ignorant of what was behind the doors. Thus, we can treat his accident as opening a door randomly. I don't think there's any plausible reading of the question under which there's a force that causes a goat to be revealed no matter what.

SamuelRichardson avatar

I've been losing on a number of markets due to silly technicalities around the wording of the question. So, given that:

What is the resolution criteria here if GPT-4 is not released? I.e. it's called GTP-4 because they discovered when training it that swapping the order of the PT to TP made it perform a lot better.

In other words, what do you consider GPT-4?

MatthewBarnett avatar
Matthew Barnett
is predicting YES at 33%

@SamuelRichardson I will resolve based on whether OpenAI staff consistently call it "GPT-4", and will resolve as N/A if it's not released by the end of the year, though I might extend that deadline.

SamuelRichardson avatar
Samuel Richardson
bought Ṁ100 of NO

@MatthewBarnett Voting NO. Seems like it has two criteria then for this to pass:

  1. It's called GPT-4

  2. It can solve the monty fall problem.

MatthewBarnett avatar
Matthew Barnett
is predicting YES at 33%

@SamuelRichardson If some system other than GPT-4 is released by OpenAI, it won't resolve NO automatically, so I don't see why my clarification would you go into the NO camp? If GPT-4 is not released by the end of the year, then this question will just resolve N/A, which favors neither YES or NO.

josh avatar

Will you count it as a win if GPT-4 says to switch, but doesn't say anything implying it's better to switch, only that it's not worse to switch? (For instance, if it works out the probabilities and tells you there's a 50% chance of it being behind your door and a 50% chance of it not, and then tells you to switch?)

MatthewBarnett avatar
Matthew Barnett
is predicting YES at 40%

@josh No because the question asked was "Is it to your advantage to switch your choice?" If I asked someone whether it was advantageous to take an action with zero expected value, and they replied "I personally would switch" I would not consider that a good answer.

ZZZZZZ avatar
ZZZ ZZZ
is predicting NO at 42%

42% probability and 69 users hold shares

ManifoldDream avatar

Manifold in the wild: A Tweet by Andrew Conner

@alangrow @Meaningness Somewhat related, directly testing my view of what GPT can't do well, for GPT-4 (when it comes out). https://manifold.markets/MatthewBarnett/will-gpt4-get-the-monty-fall-proble

BionicD0LPH1N avatar

A general policy of changing doors so long as you think it’s more likely that changing is beneficial than that changing is detrimental (and ignoring the case where it doesn’t matter), is itself going to get higher expected utility than just saying it doesn’t matter. So you should switch, if only because the worlds in which the question doesn’t matter (presumably, our world) themselves don’t matter, and I would expect a majority of the possible worlds left to be ones in which you should switch.

Therefore, there is an (totally negligible outside of nitpick) advantage to changing your choice.

MatthewBarnett avatar
Matthew Barnett
is predicting YES at 41%

Fine. If GPT-4 uses this exact argument, I'll resolve to N/A.

ZZZZZZ avatar
ZZZ ZZZ
bought Ṁ10 of NO

@BionicD0LPH1N Why would the majority of possible worlds outside of the ones where it doesn't matter be ones where you should switch?

BionicD0LPH1N avatar

@ZZZZZZ I don't have a very principled argument other than it feels intuitive to me. In the vast majority of Monty Hall-like problems, switching is beneficial. It is easy to generate a justification for switching. When reading this problem, many smart humans mistakenly? believe that switching is beneficial, whereas no one (I've seen) purports to believe that switching is worse than not switching. I'm not even sure what a justification for not switching would look like, and at least I know what a flawed justification for switching would look like.

Do you have different intuitions?

ZZZZZZ avatar
ZZZ ZZZ
is predicting NO at 41%

@BionicD0LPH1N It's impossible to know but perhaps it could be some kind of intelligence test of which we are part, the fact that we know about the Monty Hall problem but can we figure out that the Monty Fall problem is different?

LudwigBald avatar
Ludwig Bald
bought Ṁ50 of NO

I'm confused by the problem formulation: I do not think it's clear that the revealed door was chosen randomly. For example, the host would probably walk towards the door he originally planned to open, and the accident only changes the timing. In real life, it's pretty implausible that you would fall in such a way as to randomly open one of three doors.

For this reason, I think it's not clear what the correct answer should be, and GPT-4 might be confused too.

I would change my prediction if the formulation were something like: "You pick a door, say No. 1, and the host walks across the stage and falls on accident, which opens a random door, say door No. 3. There is a goat behind door No. 3."

Zardoru avatar
Zardoru
bought Ṁ10 of NO

@LudwigBald As it is said the host is ignorant about which door is good, even if the fall is a trick it can't be the usual Monty Hall problem and you have no advantage to switch.

That said, the question would confuse most humans who just have a superficial knowledge of the problem.

GPT-4 will have a bigger size than the 3, but I don't think it will change much for this kind of things, so NO at 40%.

MatthewBarnett avatar
Matthew Barnett
is predicting YES at 41%

@LudwigBald I disagree because I said the host is ignorant about what is behind each door. I don't think your logic follows.

But I do think it's interesting that people keep arguing with me about this problem.

ZZZZZZ avatar

@LudwigBald Presumably, if it was real life, there would be a button to open door 3 which the host accidentally picked.

NcyRocks avatar
N.C. Young
is predicting NO at 40%

How well does ChatGPT do on this problem?

PeterWildeford avatar
Peter Wildeford
is predicting NO at 40%

@NcyRocks It misunderstands the problem as Monty Hall and gives the standard Monty Hall solution

BoltonBailey avatar

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. The host is ignorant about what is behind each door. You pick a door, say No. 1, and the host walks across the stage and falls on accident, revealing a goat behind door No. 3. He then picks himself up, and says "Whoops. Sorry about that. But now that we know that a goat is behind door No. 3, do you want to change your selection and pick door No. 2?" Is it to your advantage to switch your choice?

No, it doesn't matter if you switch your choice or not. Because the host's fall was accidental, the probability that the car is behind door 1 is the same is the probability that the car is behind door 2.

BoltonBailey avatar

@BoltonBailey Hopefully this comment will be included in GPT-4's training data now.

calima avatar
calima
bought Ṁ30 of YES

I think this logic is wrong, and Monty Fall as specified here is logically equivalent to original Monty Hall. The original version holds because Monty will always open a door with a goat – if he tripped, there's a 1/2 chance he'd reveal the car instead. But it's part of the problem spec here that Monty's fall will always reveal a goat. In this case, the decision matrix is exactly the same as the original problem: initially, there's a 2/3rds chance the car is behind either door 2 or 3. We learn there's a goat behind door 3, therefore there's a 2/3rds chance the car is behind door 2 and the player should switch. Monty's intent has been screened off. (All that said, I'm holding no based on the listed resolution criteria).

MatthewBarnett avatar
Matthew Barnett
is predicting YES at 41%

@ClaraCollier

it's part of the problem spec here that Monty's fall will always reveal a goat

I disagree. I didn't write that, and I don't think I wrote anything that implied that. All I said was that he fell on accident and revealed a goat behind door No. 3. A natural interpretation is that he could have fallen and revealed the car, but didn't, because it wasn't behind door No. 3.

calima avatar
calima
is predicting NO at 41%

@MatthewBarnett

Sorry, "always" was a bad way to phrase it. In this particular problem, as written, a goat is revealed. The AI isn't being asked to evaluate other potential versions of the problem where other things occur. If Monty reveals a goat behind door 3 – whether or not he intended to do so! – the correct move is always for the player to switch, so that's what she should do here.

MatthewBarnett avatar
Matthew Barnett
is predicting YES at 41%

@ClaraCollier Why would it be necessary to ask it potential other versions of the problem where other things occur? In real life, assuming it was genuinely an accident, and the host did not know what was behind each door, as I specified, I wouldn't see a benefit to switching. Therefore, I don't see why we shouldn't read the question the way you are reading it.

MatthewBarnett avatar
Matthew Barnett
is predicting YES at 41%

@MatthewBarnett I meant to say "Therefore, I don't see why we should read the question the way you are reading it."

calima avatar
calima
is predicting NO at 46%

Hmm, here's another way of stating my intuition. Initially, there's a 1/3 chance the car is behind 1, and a 2/3 chance it's behind either 2 or 3. Then Monty opens door 3 and reveals a goat. Now there's still a 2/3 chance there's a car behind 2 or 3, and a 0/3 chance the car is behind 3, which means a 2/3 chance the car is behind 2. The relevant information is that door 3 contains the goat, not what decision procedure Monty used to decide to open that particular door. The reason Monty Fall and Monty Hall are different is because it's possible for Monty to accidentally open the door with the car. But this problem as written specifies a particular instance of the Monty Hall game where Monty reveals a goat – and regardless of why he did that, if the player finds herself in that particular situation she should switch.

calima avatar
calima
is predicting NO at 46%
MatthewBarnett avatar
Matthew Barnett
is predicting YES at 44%

@ClaraCollier I didn't say it was not possible for Monty to have opened door No. 3 and revealed a car. I only said that he in fact didn't reveal the car when he fell on accident. That's the crux of why I still don't buy your interpretation.

NcyRocks avatar
N.C. Young
bought Ṁ100 of NO

@ClaraCollier To slightly rework your argument:
Initially, there's a 1/3 chance the car is behind 2, and a 2/3 chance it's behind either 1 or 3. Then Monty opens door 3 and reveals a goat. Now there's still a 2/3 chance there's a car behind 1 or 3, and a 0/3 chance the car is behind 3, which means a 2/3 chance the car is behind 1.
Obviously that argument and yours can't coexist, but they're isomorphic.

Here's how I think of it: There are 3 worlds you could be in, one where the car is behind each door. Monty's fall proved that we're not in world 3, leaving 1 and 2 and implying a 50/50 chance either way. The part you're concerned about - that Monty Hall always reveals a goat behind door 3 - just means that we're not in world 3. It doesn't favour door 2 over door 1 (or vice versa).

calima avatar
calima
is predicting NO at 38%

@NcyRocks that's helpful! I am confused but I think for basically semantic reasons. I was reading the problem statement as equivalent to Rosenthal's articulation of Monty Fall – "In this variant, once you have selected one of the three doors, the host slips on a banana peel and accidentally pushes open another door, which just happens not to contain the car." In that case the relevant thing is that the problem statement itself screens off Monty tripping and revealing the actual car. But I see how specifying the door instead of framing in this way makes it relevantly distinct. I will draw some charts and settle this in my brain.

calima avatar
calima
is predicting NO at 38%

Okay, my statement that the player should always switch if Monty reveals a goat was confused. What I should have said is that the player should always switch if the problem is specified such that there is a 0% chance of Monty revealing the car, regardless of the gloss put on his actions. For me this hinges on whether the statement "Monty happened to trip on door 3, which contained a goat" is meaningfully different from "Monty happened to trip on a door which wasn't the door you initially chose and which also didn't contain the car." I've convinced myself that they are (and Matthew is right), but I still think that GPT4 won't get it for other reasons.

ManifoldDream avatar

Manifold in the wild: A Tweet by Matthew Barnett

I opened a Manifold Market about whether GPT-4 will get the Monty *Fall* problem correct. https://manifold.markets/MatthewBarnett/will-gpt4-get-the-monty-fall-proble?referrer=MatthewBarnett https://t.co/SOjwvmMinw

footgun avatar

Does the AI understand that car is more desirable than car?

footgun avatar

@footgun than goat*

MatthewBarnett avatar
Matthew Barnett
bought Ṁ10 of NO

@footgun If they don't, then I'm counting that against them. Sorry, goats.

MatthewBarnett avatar
Matthew Barnett
sold Ṁ13 of NO

@footgun Also, it doesn't change the correct answer if goats are more desirable.