Covariance of Sample Mean and Sample Standard Deviation

Introduction

Let X1,X2,,Xn be a random sample (i.e. X1,X2,,Xn are independent and identically distributed). The sample mean and sample standard deviation are defined respectively as:

X¯=1ni=1nXi,S=S2=1n1i=1n(XiX¯)2.

If the sample is taken from a normal population, then X¯ and S2 are independent. Without the assumption of normality, Zhang (2007) shows that the covariance of X¯ and S2 is μ3/n, where μ3 is the third central moment of X1 and n denotes the sample size, and Sen (2012) provides the correlation between X¯ and S2. Now naturally it is of interest to know what the covariance of X¯ and S is without the assumption of normality. In the next section we present our main results; we give the proofs in Section 3 and an example in Section 4.

Main Results

In what follows, denote the mean, variance, third and fourth central moments of X1 by μ, σ2, μ3 and μ4, respectively.

Theorem 1. The asymptotic covariance of X¯ and S is

μ32nσ.

Corollary. The asymptotic correlation between X¯ and S is

γ1γ2+2,

where γ1=μ3σ3, and γ2=μ4σ43.

Remark 1: The result shown in the Corollary has been given in Miller (1997), but there is no proof or derivation.

Proofs

Proof of Theorem 1. Let

Yi=Xiμσ,for i=1,2,,n.

It follows that

X¯=σY¯+μ, and S=σSY,

where Y¯ and SY are the sample mean and sample standard deviation of {Y1,Y2,,Yn}. Thus,

cov(X¯,S)=cov(σY¯+μ,σSY)=σ2cov(Y¯,SY)=σ2E(Y¯SY).

Since the Taylor's expansion of function f(x)=x at x=1 is 1+12(x1)+o(x1), we have

SY1+12(SY21)=12(SY2+1),

from which it follows that

E(Y¯SY)12E(Y¯(SY2+1))=12E(Y¯SY2)=12E(Y13)n  (according to Zhang,2007)=12μ3σ3n.

Therefore,

cov(X¯,S)σ212μ3σ3n=μ32nσ.

Proof of the Corollary. We have obtained the asymptotic covariance of X¯ and S, and we need to derive the variances of X¯ and S. Since

Var(X¯)=σ2n, and Var(S)=E(S2)(E(S))2=σ2(E(S))2,

we only need to show how to derive the asymptotic mean of S. If we keep the quadratic term, then the Taylor's expansion of x at x=1 is

x=1+12(x1)18(x1)2+o((x1)2). Now we have SY1+12(SY21)18(SY21)2.

Thus,

E(S)=E(σSY)σ[118Var(SY2)]=σ[118(2n1+EY143n)]=σ[118(2n1+γ2n)];

it follows that

Var(S)σ24(2n1+γ2n),

if the o(1n) terms are discarded. (Note that in the above for the variance of SY2, we used the formula in Miller, 1997, p. 7. A direct derivation of the variance formula is available from the author upon request.)

An Example

The accuracy of the result in Theorem 1 depends on the parent distribution and sample size n; in this section, we use an example to illustrate its accuracy.

Let X1,X2,,Xn be independent and identically distributed Bernoulli random variables and P(X1=1)=p, where 0<p<1. In this case the third central moment of X1 μ3=p(1p)(12p), and the standard deviation of X1

σ=p(1p),

thus the asymptotic covariance of X¯ and S is

μ32nσ=p(1p)(12p)2n.

Noticing that Xi2=Xi for i=1,2,,n, we have

S2=1n1i=1n(XiX¯)2=1n1[i=1nXi1n(i=1nXi)2]1n1(Tn1nTn2),

where Tn=i=1nXi and it has the binomial distribution b(n,p). Now, we write

cov(X¯,S)=cov(1nTn,1n1Tn1nTn2)=1nn1[E(TnTn1nTn2)E(Tn)E(Tn1nTn2)]=1nn1[j=1njj1nj2pjnpj=1nj1nj2pj],

where the binomial probability mass

pj=(nj)pj(1p)nj, for j=1,2,,n.

For some given values of n and p, we are able to compute the exact and asymptotic covariance respectively; we present the results in Table 1. We see from Table 1 that the results given by Theorem 1 is accurate even for the sample size n as small as 5.

Table 1: For various values of n and p, we obtain exact (indicated by “Exact”) and asymptotic (indicated by “Asy”) covariances.

Remark 2: For p=0.5 regardless of the value of n, our numerical results suggest that the covariance is equal to 0.

Reference

Miller, R. G. (1997). Beyond ANOVA. Chapman and Hall.

Sen, A. (2012). On the Interrelation Between the Sample Mean and the Sample Variance. The American Statistician, 66, 112-117.

Zhang, L. (2007). Sample Mean and Sample Variance: Their Covariance and Their (In)Dependence. The American Statistician, 61, 159-160.

Lingyun Zhang (张凌云)
Lingyun Zhang (张凌云)
Design Analyst

I have research interests in Statistics, applied probability and computation.