A coin toss problem

Last updated on Dec 30, 2021 2 min read

Introduction

In [1], Professor Takis Konstantopoulos put this question: Toss a fair coin 10000 times, what’s the probability of observing heads showing up exactly 5000 times? To make things more interesting, let’s generalize this coin toss problem as this one: Toss a fair coin $n=10^i$ (for $i=1,\ 2,\ \ldots,\ 10$) times, what’s the probability of observing heads showing up exactly $n/2$ times? In this note, we will solve the general problem, aiming to gain insights from the problem solving process.

Two methods

Method 1: We can use R function

dbinom(x = n/2, size = n, prob = 1/2)

Method 2: We can try out the Central Limit Theorem (CLT). Let random variable $X$ follow Binomial$(n, 1/2)$ distribution. We want to find $$ \hbox{P}_n(X = n/2), $$ where we use subscript $n$ to emphasize that the probability depends on $n$. We have $$\begin{array}{cl} &\hbox{P}_n(X = n/2)\\ =&\hbox{P}_n(n/2 \le X < n/2 +1)\\ =&\hbox{P}_n(0 \le Y < 1/\sqrt{n/4}), \end{array}$$ where $$ Y=\frac{X-n/2}{\sqrt{n/4}}. $$ According to the CLT, if $n$ is large enough then $Y$ approximately follows the standard normal distribution. Thus, $$ \hbox{P}_n(X=n/2)\approx \hbox{pnorm}(b) - 0.5, $$ where $b=1/\sqrt{n/4}$.

R code and results

# helper functions --------------------------------------------------------
## method 1
find_prob_builtin <- function(n = 10)
{dbinom(x = n/2, size = n, prob = 1/2)
}

## method 2
find_prob_clt <- function(n = 10)
{b <- 1 / sqrt(n/4)
 pnorm(b) - 0.5
}

# some results ------------------------------------------------------------
y1 <- sapply(10^(1:10), find_prob_builtin)
y2 <- sapply(10^(1:10), find_prob_clt)

df <- data.frame(i = 1:10,
                 Pn_method1 = y1,
                 Pn_method2 = y2)

knitr::kable(df, format = 'html')

i	Pn_method1	Pn_method2
1	0.2460938	0.2364554
2	0.0795892	0.0792597
3	0.0252250	0.0252145
4	0.0079786	0.0079783
5	0.0025231	0.0025231
6	0.0007979	0.0007979
7	0.0002523	0.0002523
8	0.0000798	0.0000798
9	0.0000252	0.0000252
10	0.0000080	0.0000080

From the above table, we can have the following insights:

The CLT works well even when $n=10$.
For $n=10^i$ ($i=1,\ \ldots, 10$), an empirical formula is $$\begin{array}{ccl} \hbox{P}_n(X=10^i/2)& \approx & 2.5\times 10^{-(i+1)/2},\ \hbox{if}\ n \ \hbox{is odd}\\ & \approx & 8.0\times 10^{-(i+2)/2},\ \hbox{if}\ n \ \hbox{is even}. \end{array} $$

Further discussions

Using Stirling’s formula, we can show that $$ \lim_{n\rightarrow +\infty} \hbox{P}_n(X=n/2)=0. $$

To fully understand the mathematics/algorithm behind dbinom(), we can study [2].

References

[1] Konstantopoulos, T. (2020). Takis Tackles: Mathematical Higher Education. URL: https://imstat.org/2020/07/16/takis-tackles-mathematical-higher-education/

[2] Loader, C. (2002). Fast and Accurate Computation of Binomial Probabilities. URL: https://www.r-project.org/doc/reports/CLoader-dbinom-2002.pdf