On Estimating the Parameter θ of a Uniform Distribution

Introduction

Problem Statement: $(X_1,\ X_2, \ldots,\ X_n)$ are independent and identically distributed (i.i.d.) and $X_1\sim Uniform(0,\ \theta)$, where $\theta$ is unknown and to be estimated. How to estimate $\theta$ when observations of $(X_1,\ X_2, \ldots,\ X_n)$ are available?

Obviously, the simplest estimator of $\theta$ is the moment estimator:

$$ \hat{\theta}_M = 2\frac{\sum X_i}{n}=2\bar{X}. $$

It is easy to derive the Maximum Likelihood Estimator (MLE) of $ \theta $:

$$ \hat{\theta}_{MLE} = \hbox{max}\{X_1,\ \ldots,\ X_n\}=X_{(n)}. $$

Checking the literature, e.g. this StackExchange post:

https://math.stackexchange.com/questions/60497/unbiased-estimator-for-a-uniform-variable-support

we can know that the Uniformly Minimum Variance Unbiased Estimator (UMVUE) of $ \theta $ is:

$$ \hat{\theta}_{UMVUE} = \frac{n+1}{n} X_{(n)}. $$

We should notice that the probability density function (PDF) of $X_{(n)}$ is given by

$$ f(x) = \frac{n}{\theta^n}x^{n-1},\ \hbox{for}\ 0< x < \theta. $$

For the above three estimators, we can list some theoretical results:

  • (un-)biasedness

$$ E(\hat{\theta}_M) = \theta; $$

$$ E(\hat{\theta}_{MLE}) = \frac{n}{n+1}\theta; $$

$$ E(\hat{\theta}_{UMVUE}) = \theta. $$

  • Mean Squared Error

$$ E(\hat{\theta}_{M}-\theta)^2 = \frac{\theta^2}{3n}; $$

$$ E(\hat{\theta}_{MLE}-\theta)^2 = \frac{2\theta^2}{(n+1)(n+2)}; $$

$$ E(\hat{\theta}_{UMVUE}-\theta)^2 = \frac{\theta^2}{n(n+2)}. $$

To compare the two estimators,

$$ \hat{\theta}_{MLE}\ \hbox{and}\ \hat{\theta}_{UMVUE}, $$

we want to find

$$ p := \hbox{Prob} \left( |\hat{\theta}_{MLE}-\theta| \le |\hat{\theta}_{UMVUE}-\theta| \right). $$

After some algebraic operations (see the Appendix), we can derive the interesting result:

$$ p = 1 - \left( \frac{2n}{2n + 1} \right)^n \rightarrow 1 - e^{-1/2}\approx 0.3935,\ \hbox{as}\ n\rightarrow +\infty $$

In the next section, we do a small simulation study to “show” the correctness of the above results.

Simulation study

library(dplyr)
library(purrr)

my_simu <- function(N = 10000, n = 5, theta = 1, the_seed = 123456)
{set.seed(the_seed)
 
 data <- matrix(runif(N * n, min = 0, max = theta), ncol = N)
 
 # moment estimate
 theta_M <- apply(data, 2, function(x) 2 * mean(x))
 
 # MLE
 theta_MLE <- apply(data, 2, max)
 
 # UMVUE (MLE adjusted)
 theta_MLE_a <- (1 + 1 / n) * theta_MLE
 
 df <- data.frame(theta_m = theta_M,
                  theta_mle = theta_MLE,
                  theta_mle_a = theta_MLE_a)
 return(df)
}

my_analysis <- function(df, theta)
{MSE <- vapply(df, function(x) mean((x - theta)^2), FUN.VALUE = numeric(1))

 d <- dim(df)[1]
 abs_dev_df <- map_dfc(df, function(x) abs(x - theta))
 est_1_vs_est_2 <- sum(abs_dev_df[[1]] <= abs_dev_df[[2]]) / d
 est_1_vs_est_3 <- sum(abs_dev_df[[1]] <= abs_dev_df[[3]]) / d
 est_2_vs_est_3 <- sum(abs_dev_df[[2]] <= abs_dev_df[[3]]) / d
 
 output_df <- data.frame(theta_m_comp = c(MSE[1], NA, NA),
                      theta_mle_comp = c(est_1_vs_est_2, MSE[2], NA),
                      theta_mle_a_comp = c(est_1_vs_est_3, est_2_vs_est_3, MSE[3]))
 return(output_df)
}

# test 1
test_1 <- my_simu(N = 10000, n = 10, theta = 1)
(my_re_1 <- my_analysis(test_1, theta = 1))
##   theta_m_comp theta_mle_comp theta_mle_a_comp
## 1   0.03269422     0.34670000      0.264500000
## 2           NA     0.01519611      0.382500000
## 3           NA             NA      0.008284822
test_1 <- my_simu(N = 10000, n = 100, theta = 1)
(my_re_1_ <- my_analysis(test_1, theta = 1))
##   theta_m_comp theta_mle_comp theta_mle_a_comp
## 1  0.003331064   0.1364000000      1.04700e-01
## 2           NA   0.0001954537      4.03400e-01
## 3           NA             NA      9.95005e-05
test_1 <- my_simu(N = 10000, n = 1000, theta = 1)
(my_re_1__ <- my_analysis(test_1, theta = 1))
##   theta_m_comp theta_mle_comp theta_mle_a_comp
## 1 0.0003256795   4.460000e-02     3.370000e-02
## 2           NA   1.947045e-06     3.996000e-01
## 3           NA             NA     9.742283e-07
# test 2
test_2 <- my_simu(N = 10000, n = 10, theta = 10)
(my_re_2 <- my_analysis(test_2, theta = 10))
##   theta_m_comp theta_mle_comp theta_mle_a_comp
## 1     3.269422       0.346700        0.2645000
## 2           NA       1.519611        0.3825000
## 3           NA             NA        0.8284822
test_2 <- my_simu(N = 10000, n = 100, theta = 10)
(my_re_2_ <- my_analysis(test_2, theta = 10))
##   theta_m_comp theta_mle_comp theta_mle_a_comp
## 1    0.3331064     0.13640000       0.10470000
## 2           NA     0.01954537       0.40340000
## 3           NA             NA       0.00995005
test_2 <- my_simu(N = 10000, n = 1000, theta = 10)
(my_re_2__ <- my_analysis(test_2, theta = 10))
##   theta_m_comp theta_mle_comp theta_mle_a_comp
## 1   0.03256795   0.0446000000     3.370000e-02
## 2           NA   0.0001947045     3.996000e-01
## 3           NA             NA     9.742283e-05
# test 3
test_3 <- my_simu(N = 10000, n = 10, theta = 100)
(my_re_3 <- my_analysis(test_3, theta = 100))
##   theta_m_comp theta_mle_comp theta_mle_a_comp
## 1     326.9422         0.3467          0.26450
## 2           NA       151.9611          0.38250
## 3           NA             NA         82.84822
test_3 <- my_simu(N = 10000, n = 100, theta = 100)
(my_re_3_ <- my_analysis(test_3, theta = 100))
##   theta_m_comp theta_mle_comp theta_mle_a_comp
## 1     33.31064       0.136400         0.104700
## 2           NA       1.954537         0.403400
## 3           NA             NA         0.995005
test_3 <- my_simu(N = 10000, n = 1000, theta = 100)
(my_re_3__ <- my_analysis(test_3, theta = 100))
##   theta_m_comp theta_mle_comp theta_mle_a_comp
## 1     3.256795     0.04460000      0.033700000
## 2           NA     0.01947045      0.399600000
## 3           NA             NA      0.009742283

Appendix

Lingyun Zhang (张凌云)
Lingyun Zhang (张凌云)
Design Analyst

I have research interests in Statistics, applied probability and computation.