On Estimating the Parameter θ of a Uniform Distribution
Introduction
Problem Statement: $(X_1,\ X_2, \ldots,\ X_n)$ are independent and identically distributed (i.i.d.) and $X_1\sim Uniform(0,\ \theta)$, where $\theta$ is unknown and to be estimated. How to estimate $\theta$ when observations of $(X_1,\ X_2, \ldots,\ X_n)$ are available?
Obviously, the simplest estimator of $\theta$ is the moment estimator:
$$ \hat{\theta}_M = 2\frac{\sum X_i}{n}=2\bar{X}. $$
It is easy to derive the Maximum Likelihood Estimator (MLE) of $ \theta $:$$ \hat{\theta}_{MLE} = \hbox{max}\{X_1,\ \ldots,\ X_n\}=X_{(n)}. $$
Checking the literature, e.g. this StackExchange post:https://math.stackexchange.com/questions/60497/unbiased-estimator-for-a-uniform-variable-support
we can know that the Uniformly Minimum Variance Unbiased Estimator (UMVUE) of $ \theta $ is: $$ \hat{\theta}_{UMVUE} = \frac{n+1}{n} X_{(n)}. $$We should notice that the probability density function (PDF) of $X_{(n)}$ is given by
$$ f(x) = \frac{n}{\theta^n}x^{n-1},\ \hbox{for}\ 0< x < \theta. $$
For the above three estimators, we can list some theoretical results:
- (un-)biasedness
$$ E(\hat{\theta}_M) = \theta; $$
$$ E(\hat{\theta}_{MLE}) = \frac{n}{n+1}\theta; $$
$$ E(\hat{\theta}_{UMVUE}) = \theta. $$
- Mean Squared Error
$$ E(\hat{\theta}_{M}-\theta)^2 = \frac{\theta^2}{3n}; $$
$$ E(\hat{\theta}_{MLE}-\theta)^2 = \frac{2\theta^2}{(n+1)(n+2)}; $$
$$ E(\hat{\theta}_{UMVUE}-\theta)^2 = \frac{\theta^2}{n(n+2)}. $$
To compare the two estimators,
$$ \hat{\theta}_{MLE}\ \hbox{and}\ \hat{\theta}_{UMVUE}, $$
we want to find$$ p := \hbox{Prob} \left( |\hat{\theta}_{MLE}-\theta| \le |\hat{\theta}_{UMVUE}-\theta| \right). $$
After some algebraic operations (see the Appendix), we can derive the interesting result:$$ p = 1 - \left( \frac{2n}{2n + 1} \right)^n \rightarrow 1 - e^{-1/2}\approx 0.3935,\ \hbox{as}\ n\rightarrow +\infty $$
In the next section, we do a small simulation study to “show” the correctness of the above results.
Simulation study
library(dplyr)
library(purrr)
my_simu <- function(N = 10000, n = 5, theta = 1, the_seed = 123456)
{set.seed(the_seed)
data <- matrix(runif(N * n, min = 0, max = theta), ncol = N)
# moment estimate
theta_M <- apply(data, 2, function(x) 2 * mean(x))
# MLE
theta_MLE <- apply(data, 2, max)
# UMVUE (MLE adjusted)
theta_MLE_a <- (1 + 1 / n) * theta_MLE
df <- data.frame(theta_m = theta_M,
theta_mle = theta_MLE,
theta_mle_a = theta_MLE_a)
return(df)
}
my_analysis <- function(df, theta)
{MSE <- vapply(df, function(x) mean((x - theta)^2), FUN.VALUE = numeric(1))
d <- dim(df)[1]
abs_dev_df <- map_dfc(df, function(x) abs(x - theta))
est_1_vs_est_2 <- sum(abs_dev_df[[1]] <= abs_dev_df[[2]]) / d
est_1_vs_est_3 <- sum(abs_dev_df[[1]] <= abs_dev_df[[3]]) / d
est_2_vs_est_3 <- sum(abs_dev_df[[2]] <= abs_dev_df[[3]]) / d
output_df <- data.frame(theta_m_comp = c(MSE[1], NA, NA),
theta_mle_comp = c(est_1_vs_est_2, MSE[2], NA),
theta_mle_a_comp = c(est_1_vs_est_3, est_2_vs_est_3, MSE[3]))
return(output_df)
}
# test 1
test_1 <- my_simu(N = 10000, n = 10, theta = 1)
(my_re_1 <- my_analysis(test_1, theta = 1))
## theta_m_comp theta_mle_comp theta_mle_a_comp
## 1 0.03269422 0.34670000 0.264500000
## 2 NA 0.01519611 0.382500000
## 3 NA NA 0.008284822
test_1 <- my_simu(N = 10000, n = 100, theta = 1)
(my_re_1_ <- my_analysis(test_1, theta = 1))
## theta_m_comp theta_mle_comp theta_mle_a_comp
## 1 0.003331064 0.1364000000 1.04700e-01
## 2 NA 0.0001954537 4.03400e-01
## 3 NA NA 9.95005e-05
test_1 <- my_simu(N = 10000, n = 1000, theta = 1)
(my_re_1__ <- my_analysis(test_1, theta = 1))
## theta_m_comp theta_mle_comp theta_mle_a_comp
## 1 0.0003256795 4.460000e-02 3.370000e-02
## 2 NA 1.947045e-06 3.996000e-01
## 3 NA NA 9.742283e-07
# test 2
test_2 <- my_simu(N = 10000, n = 10, theta = 10)
(my_re_2 <- my_analysis(test_2, theta = 10))
## theta_m_comp theta_mle_comp theta_mle_a_comp
## 1 3.269422 0.346700 0.2645000
## 2 NA 1.519611 0.3825000
## 3 NA NA 0.8284822
test_2 <- my_simu(N = 10000, n = 100, theta = 10)
(my_re_2_ <- my_analysis(test_2, theta = 10))
## theta_m_comp theta_mle_comp theta_mle_a_comp
## 1 0.3331064 0.13640000 0.10470000
## 2 NA 0.01954537 0.40340000
## 3 NA NA 0.00995005
test_2 <- my_simu(N = 10000, n = 1000, theta = 10)
(my_re_2__ <- my_analysis(test_2, theta = 10))
## theta_m_comp theta_mle_comp theta_mle_a_comp
## 1 0.03256795 0.0446000000 3.370000e-02
## 2 NA 0.0001947045 3.996000e-01
## 3 NA NA 9.742283e-05
# test 3
test_3 <- my_simu(N = 10000, n = 10, theta = 100)
(my_re_3 <- my_analysis(test_3, theta = 100))
## theta_m_comp theta_mle_comp theta_mle_a_comp
## 1 326.9422 0.3467 0.26450
## 2 NA 151.9611 0.38250
## 3 NA NA 82.84822
test_3 <- my_simu(N = 10000, n = 100, theta = 100)
(my_re_3_ <- my_analysis(test_3, theta = 100))
## theta_m_comp theta_mle_comp theta_mle_a_comp
## 1 33.31064 0.136400 0.104700
## 2 NA 1.954537 0.403400
## 3 NA NA 0.995005
test_3 <- my_simu(N = 10000, n = 1000, theta = 100)
(my_re_3__ <- my_analysis(test_3, theta = 100))
## theta_m_comp theta_mle_comp theta_mle_a_comp
## 1 3.256795 0.04460000 0.033700000
## 2 NA 0.01947045 0.399600000
## 3 NA NA 0.009742283