Covariance of Sample Mean and Sample Standard Deviation

Last updated on Oct 29, 2022 4 min read

Introduction

Let $X_{1}, X_{2}, \dots, X_{n}$ be a random sample (i.e. $X_{1}, X_{2}, \dots, X_{n}$ are independent and identically distributed). The sample mean and sample standard deviation are defined respectively as:

$\bar{X} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}, S = \sqrt{S^{2}} = \sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}} .$

If the sample is taken from a normal population, then

\bar{X}

and

S^{2}

are independent. Without the assumption of normality, Zhang (2007) shows that the covariance of

\bar{X}

and

S^{2}

μ_{3} / n

, where

μ_{3}

is the third central moment of

X_{1}

and

n

denotes the sample size, and Sen (2012) provides the correlation between

\bar{X}

and

S^{2}

. Now naturally it is of interest to know what the covariance of

\bar{X}

and

S

is without the assumption of normality. In the next section we present our main results; we give the proofs in Section 3 and an example in Section 4.

Main Results

In what follows, denote the mean, variance, third and fourth central moments of $X_{1}$ by $μ$ , $σ^{2}$ , $μ_{3}$ and $μ_{4}$ , respectively.

Theorem 1. The asymptotic covariance of $\bar{X}$ and $S$ is

$\frac{μ_{3}}{2 n σ} .$

Corollary. The asymptotic correlation between $\bar{X}$ and $S$ is

$\frac{γ_{1}}{\sqrt{γ_{2} + 2}},$

where

γ_{1} = \frac{μ_{3}}{σ^{3}}, and γ_{2} = \frac{μ_{4}}{σ^{4}} - 3.

Remark 1: The result shown in the Corollary has been given in Miller (1997), but there is no proof or derivation.

Proofs

Proof of Theorem 1. Let

$Y_{i} = \frac{X_{i} - μ}{σ}, for i = 1, 2, \dots, n .$

It follows that

$\bar{X} = σ \bar{Y} + μ, and S = σ S_{Y},$

where

\bar{Y}

and

S_{Y}

are the sample mean and sample standard deviation of

{Y_{1}, Y 2, \dots, Y_{n}}

. Thus,

$\begin{array}{ccl} cov (\bar{X}, S) & = & cov (σ \bar{Y} + μ, σ S_{Y}) \\ = & σ^{2} cov (\bar{Y}, S_{Y}) \\ = & σ^{2} E (\bar{Y} S_{Y}) . \end{array}$

Since the Taylor's expansion of function

f (x) = \sqrt{x}

x = 1

1 + \frac{1}{2} (x - 1) + o (x - 1),

we have

$S_{Y} \approx 1 + \frac{1}{2} (S_{Y}^{2} - 1) = \frac{1}{2} (S_{Y}^{2} + 1),$

from which it follows that

$\begin{array}{ccl} E (\bar{Y} S_{Y}) & \approx & \frac{1}{2} E (\bar{Y} (S_{Y}^{2} + 1)) \\ = & \frac{1}{2} E (\bar{Y} S_{Y}^{2}) \\ = & \frac{1}{2} \frac{E (Y_{1}^{3})}{n} (according to Zhang,2007) \\ = & \frac{1}{2} \frac{μ_{3}}{σ^{3} n} . \end{array}$

Therefore,

$cov (\bar{X}, S) \approx σ^{2} \frac{1}{2} \frac{μ_{3}}{σ^{3} n} = \frac{μ_{3}}{2 n σ} .$

Proof of the Corollary. We have obtained the asymptotic covariance of $\bar{X}$ and $S$ , and we need to derive the variances of $\bar{X}$ and $S$ . Since

$Var (\bar{X}) = \frac{σ^{2}}{n}, and Var (S) = E (S^{2}) - (E (S))^{2} = σ^{2} - (E (S))^{2},$

we only need to show how to derive the asymptotic mean of

S

. If we keep the quadratic term, then the Taylor's expansion of

\sqrt{x}

x = 1

$\sqrt{x} = 1 + \frac{1}{2} (x - 1) - \frac{1}{8} (x - 1)^{2} + o ((x - 1)^{2}) .$ Now we have $S_{Y} \approx 1 + \frac{1}{2} (S_{Y}^{2} - 1) - \frac{1}{8} (S_{Y}^{2} - 1)^{2} .$

Thus,

$\begin{array}{ccl} E (S) & = & E (σ S_{Y}) \\ \approx & σ [1 - \frac{1}{8} Var (S_{Y}^{2})] \\ = & σ [1 - \frac{1}{8} (\frac{2}{n - 1} + \frac{E Y_{1}^{4} - 3}{n})] \\ = & σ [1 - \frac{1}{8} (\frac{2}{n - 1} + \frac{γ_{2}}{n})]; \end{array}$

it follows that

$Var (S) \approx \frac{σ^{2}}{4} (\frac{2}{n - 1} + \frac{γ_{2}}{n}),$

if the

o (\frac{1}{n})

terms are discarded. (Note that in the above for the variance of

S_{Y}^{2}

, we used the formula in Miller, 1997, p. 7. A direct derivation of the variance formula is available from the author upon request.)

An Example

The accuracy of the result in Theorem 1 depends on the parent distribution and sample size $n$ ; in this section, we use an example to illustrate its accuracy.

Let $X_{1}, X_{2}, \dots, X_{n}$ be independent and identically distributed Bernoulli random variables and $P (X_{1} = 1) = p$ , where $0 < p < 1$ . In this case the third central moment of $X_{1}$ $μ_{3} = p (1 - p) (1 - 2 p),$ and the standard deviation of $X_{1}$

$σ = \sqrt{p (1 - p)},$

thus the asymptotic covariance of

\bar{X}

and

S

$\frac{μ_{3}}{2 n σ} = \frac{\sqrt{p (1 - p)} (1 - 2 p)}{2 n} .$

Noticing that $X_{i}^{2} = X_{i}$ for $i = 1, 2, \dots, n$ , we have

$\begin{matrix} \begin{array}{ccl} S^{2} & = & \frac{1}{n - 1} \sum_{i = 1}^{n} (X_{i} - \bar{X})^{2} \\ = & \frac{1}{n - 1} [\sum_{i = 1}^{n} X_{i} - \frac{1}{n} {(\sum_{i = 1}^{n} X_{i})}^{2}] \\ \equiv & \frac{1}{n - 1} (T_{n} - \frac{1}{n} T_{n}^{2}), \end{array} \end{matrix}$

where

T_{n} = \sum_{i = 1}^{n} X_{i}

and it has the binomial distribution

b (n, p)

. Now, we write

$\begin{array}{ccl} cov (\bar{X}, S) & = & cov (\frac{1}{n} T_{n}, \frac{1}{\sqrt{n - 1}} \sqrt{T_{n} - \frac{1}{n} T_{n}^{2}}) \\ = & \frac{1}{n \sqrt{n - 1}} [E (T_{n} \sqrt{T_{n} - \frac{1}{n} T_{n}^{2}}) \\ - E (T_{n}) E (\sqrt{T_{n} - \frac{1}{n} T_{n}^{2}})] \\ = & \frac{1}{n \sqrt{n - 1}} [\sum_{j = 1}^{n} j \sqrt{j - \frac{1}{n} j^{2}} p_{j} \\ - n p \sum_{j = 1}^{n} \sqrt{j - \frac{1}{n} j^{2}} p_{j}], \end{array}$

where the binomial probability mass

$p_{j} = (\begin{matrix} n \\ j \end{matrix}) p^{j} (1 - p)^{n - j}, for j = 1, 2, \dots, n .$

For some given values of

n

and

p

, we are able to compute the exact and asymptotic covariance respectively; we present the results in Table 1. We see from Table 1 that the results given by Theorem 1 is accurate even for the sample size

n

as small as

5

Table 1: For various values of $n$ and $p$ , we obtain exact (indicated by “Exact”) and asymptotic (indicated by “Asy”) covariances.

Remark 2: For $p = 0.5$ regardless of the value of $n$ , our numerical results suggest that the covariance is equal to $0$ .

Reference

Miller, R. G. (1997). Beyond ANOVA. Chapman and Hall.

Sen, A. (2012). On the Interrelation Between the Sample Mean and the Sample Variance. The American Statistician, 66, 112-117.

Zhang, L. (2007). Sample Mean and Sample Variance: Their Covariance and Their (In)Dependence. The American Statistician, 61, 159-160.

Covariance of Sample Mean and Sample Standard Deviation

Introduction

Main Results

Proofs

An Example

Reference

Lingyun Zhang (张凌云)

Design Analyst