A coin toss problem
Introduction
In [1], Professor Takis Konstantopoulos put this question: Toss a fair coin 10000 times, what’s the probability of observing heads showing up exactly 5000 times? To make things more interesting, let’s generalize this coin toss problem as this one: Toss a fair coin $n=10^i$ (for $i=1,\ 2,\ \ldots,\ 10$) times, what’s the probability of observing heads showing up exactly $n/2$ times? In this note, we will solve the general problem, aiming to gain insights from the problem solving process.
Two methods
Method 1: We can use R function
dbinom(x = n/2, size = n, prob = 1/2)
Method 2: We can try out the Central Limit Theorem (CLT). Let random variable $X$ follow Binomial$(n, 1/2)$ distribution. We want to find $$ \hbox{P}_n(X = n/2), $$ where we use subscript $n$ to emphasize that the probability depends on $n$. We have $$\begin{array}{cl} &\hbox{P}_n(X = n/2)\\ =&\hbox{P}_n(n/2 \le X < n/2 +1)\\ =&\hbox{P}_n(0 \le Y < 1/\sqrt{n/4}), \end{array}$$ where $$ Y=\frac{X-n/2}{\sqrt{n/4}}. $$ According to the CLT, if $n$ is large enough then $Y$ approximately follows the standard normal distribution. Thus, $$ \hbox{P}_n(X=n/2)\approx \hbox{pnorm}(b) - 0.5, $$ where $b=1/\sqrt{n/4}$.
R code and results
# helper functions --------------------------------------------------------
## method 1
find_prob_builtin <- function(n = 10)
{dbinom(x = n/2, size = n, prob = 1/2)
}
## method 2
find_prob_clt <- function(n = 10)
{b <- 1 / sqrt(n/4)
pnorm(b) - 0.5
}
# some results ------------------------------------------------------------
y1 <- sapply(10^(1:10), find_prob_builtin)
y2 <- sapply(10^(1:10), find_prob_clt)
df <- data.frame(i = 1:10,
Pn_method1 = y1,
Pn_method2 = y2)
knitr::kable(df, format = 'html')
i | Pn_method1 | Pn_method2 |
---|---|---|
1 | 0.2460938 | 0.2364554 |
2 | 0.0795892 | 0.0792597 |
3 | 0.0252250 | 0.0252145 |
4 | 0.0079786 | 0.0079783 |
5 | 0.0025231 | 0.0025231 |
6 | 0.0007979 | 0.0007979 |
7 | 0.0002523 | 0.0002523 |
8 | 0.0000798 | 0.0000798 |
9 | 0.0000252 | 0.0000252 |
10 | 0.0000080 | 0.0000080 |
From the above table, we can have the following insights:
-
The CLT works well even when $n=10$.
-
For $n=10^i$ ($i=1,\ \ldots, 10$), an empirical formula is $$\begin{array}{ccl} \hbox{P}_n(X=10^i/2)& \approx & 2.5\times 10^{-(i+1)/2},\ \hbox{if}\ n \ \hbox{is odd}\\ & \approx & 8.0\times 10^{-(i+2)/2},\ \hbox{if}\ n \ \hbox{is even}. \end{array} $$
Further discussions
Using Stirling’s formula, we can show that $$ \lim_{n\rightarrow +\infty} \hbox{P}_n(X=n/2)=0. $$
To fully understand the mathematics/algorithm behind dbinom()
, we can study [2].
References
[1] Konstantopoulos, T. (2020). Takis Tackles: Mathematical Higher Education. URL: https://imstat.org/2020/07/16/takis-tackles-mathematical-higher-education/
[2] Loader, C. (2002). Fast and Accurate Computation of Binomial Probabilities. URL: https://www.r-project.org/doc/reports/CLoader-dbinom-2002.pdf