The Monty Hall problem is an interesting puzzle in probability which has a counter-intuitive answer. In fact, it is documented that many people got the answer wrong and many people continue to get the answer wrong.
The problem
The problem goes somewhat like this:
Imagine a game show. There are 3 doors and behind one door there is a car and behind the other 2, goats. The player picks a door hoping to win a car that’s behind the door. The game show host opens one of the other doors to reveal a goat (always). Does the player need to switch the choice of his door or stick with his first guess?
When I first saw the problem, I went headlong into solving it and got the answer wrong.
The common mistake
The usual first solution that is offered is to not switch the chosen door after one of the doors is opened. After this step, most people think there is equal probability of the car being in either of the remaining closed doors. Lot of people got the answer wrong when it was first posed and continue to get it wrong. Wikipedia has documented many reasons why people make this choice of not wanting to switch which is an interesting read.
The actual solution (Spoiler alert!)
The correct answer is to switch the choice of the door after the game host has revealed a goat by opening one of the two closed doors.
An intuitive explanation
There is a purely probability based solution described in many places on the internet, which is somewhat of a hard read. Yes, I looked up the answer, but it was kind of unsatisfying, mainly because it was hard to intuitively grasp the answer based on probability. While thinking over this on and off, I chanced to come up with an alternative solution which is also presented elsewhere on the internet.
To begin with, there is one car and two goats behind 3 closed doors. So this essentially means there is a higher probability of picking a door with a goat behind it. Let’s recall from the definition of probability, in our case:
The probability of picking a goat is:
Number of goats / total number of all things (goats and cars) = 2 / 3
So, statiscally, there is a higher chance of picking a door with a goat behind it in the beginning. Therefore, it makes sense to switch the door after the game show host has opened to reveal a goat behind one of the closed doors.
In fact, even if the problem were to be modified as I explain below, the correct answer is to always switch the door.
Let’s say we have 100 closed doors, with 99 goats and 1 car behind them. Now you can immediately see there is more chance (99/100 to be exact) of somebody randomly picking a door with a goat behind it. Now if the game show host opens 98 of the 99 closed doors to reveal 98 goats behind them, leaving only one door unopened, then doesn’t it make sense to switch the door that you initially picked?
Simulation
Let’s create a simulation to drive home this intuition and to prove that this is indeed the correct answer.
First, let’s write a function to create a game. The game consists of a number of doors with 1 car and the rest of them goats. The function returns an array whose elements represent doors and what’s behind them, i.e., 1 for car and 0 for goats.
Now let’s run it to see that indeed the car is behind a different door in each game instance (statistically speaking).
|
|
Games with 3 doors:
[0 1 0]
[1 0 0]
[0 1 0]
[0 0 1]
[0 0 1]
Games with 10 doors:
[0 0 0 0 0 0 1 0 0 0]
[0 0 0 0 0 0 0 1 0 0]
[0 0 0 0 0 0 1 0 0 0]
[0 0 0 0 0 0 0 0 0 1]
[0 0 0 0 0 0 0 0 0 1]
We can see that the car (1) is set in the arrays above at random positions within each array.
Let’s create a function to select a door in an instance of the game. This function will return the player’s choice of the door by sampling a random number from the number of doors.
|
|
We then simulate the part where the game show host reveals all the doors that have goats behind them except for one door and the player ends up with two doors to figure out what to do next.
We can do this by creating a new array that represents the doors that includes the value of the index (door) that was chosen by the player and the value of the remaining item in the array after revealing all the goats.
If the player had selected a door with a car behind it (value 1), then the remaining doors are all goats (0s). The new array will contain elements [1, 0].
If the player had selected a door with a goat behind it (value 0), the remaining doors have one car (value 1) and the rest goats (0s). So the new array will have the elements [0, 1] because the game show host has to reveal only the doors with goats behind them which leaves one unopened door with a car behind it.
|
|
Now, the following two cases are possible:
Player switches the door
When the player switches the door, the result is given by the second element of the two element array.
Player does not switch
In this case, the result is given by the first element of the two element array.
|
|
Results
Now that we have all the pieces, we can put them together and run the simulation to study the overall winnings when the player switches and when they do not.
The entire code is shown below and is also available in my github repo.
|
|
As we can see in the plot, the overall winnings is higher in the long run if the
player chooses to switch the door. This is of course a statistical result, which
means that there is about 1/3 probability of the player winning even if he
chooses not to switch. The plot is very similar to this wikipedia version. Due
to the probabilistic nature of the game/simulation, the plots will look
slightly different each time the simulation is run.
And now! Monty Hall problem with 100 doors ;)
If you run the simulation for a 100 door game (1 car, 99 goats), it’s quite
evident from the plot below that it makes perfect sense to switch! The
chances of winning in this case if you don’t switch is miniscule.
And now! Monty Hall problem with a million doors
Ok, now I know I am pushing it. Until later, bye and hope this was interesting and fun to read.