A Note on Spread of Multivariate Distriution

For a univariate random variable $X$, if we want to describe its distribution’s spread, we can choose AD (Average Deviation from the mean)

$$ E|X-\mu| $$

or variance

$$ E(X-\mu)^2. $$

Of course, when we talk about AD and variance we assume that they exist. I asked myself this question: For a bivariate (or multivariate) distribution, what are the equivalent to AD and variance?

It is common sense that for a multivariate distribution we are concerned with its covariance matrix

$$ E({\boldsymbol X}-{\boldsymbol \mu})({\boldsymbol X}-{\boldsymbol \mu})^T. $$

However, covariance matrix is not a number! If we want to have a number, which describes the spread of a multivariate distribution, then it seems to me that we can use

$$ E({\boldsymbol X}-{\boldsymbol \mu})^T({\boldsymbol X}-{\boldsymbol \mu}). $$

Note that

$$ E({\boldsymbol X}-{\boldsymbol \mu})^T({\boldsymbol X}-{\boldsymbol \mu}) = \hbox{trace}(E({\boldsymbol X}-{\boldsymbol \mu})({\boldsymbol X}-{\boldsymbol \mu})^T). $$

If we want to extend $E|X-\mu|$ to multivariate distribution, then we define

$$ AD_m = E(\sqrt{({\boldsymbol X}-{\boldsymbol \mu})^T({\boldsymbol X}-{\boldsymbol \mu}})), $$

where th subscript "$m$" indicates it's for a multivariate distribution.

NB: I checked my Chinese textbook on multivariate analysis, the measure of multivariate spread is given by

$$ E({\boldsymbol X}-{\boldsymbol \mu})^T{\boldsymbol \Sigma}^{-1}({\boldsymbol X}-{\boldsymbol \mu}), $$

where $ {\boldsymbol \Sigma} $ is the distribution’s covariance matrix. In this definition, the Mahalanobis distance instead of the Euclidean distance is used.

Example 1: $(X, Y)^T$ has standard bivariate normal distribution. Find the distribution’s $AD_m$.

Solution:

\begin{align} AD_m & = \frac{1}{2\pi} \int_{-\infty}^{+\infty} \int_{-\infty}^{+\infty} \sqrt{x^2+y^2}\exp{\{-(x^2+y^2)/2\}}dxdy\\ & = \frac{1}{2\pi} \int_{0}^{+\infty}\int_{0}^{2\pi} r e^{-r^2/2} r d\theta dr\\ & = \int_{0}^{+\infty} r^2 e^{-r^2/2} dr\\ & = \int_{0}^{+\infty} 2u e^{-u}\frac{\sqrt{2}}{2} u^{-1/2} du\\ & = \sqrt{2} \int_{0}^{+\infty} u^{1/2} e^{-u} du\\ & = \sqrt{2} \Gamma(3/2)\\ & = \frac{\sqrt{2}}{2} \Gamma(1/2)\\ & = \frac{\sqrt{2\pi}}{2}. \end{align}

NB: If $X \sim N(0, 1)$, then

$$ E|X| = \frac{2}{\sqrt{2\pi}}. $$

Example 2: $ (X, Y)^T $ has normal distribution with ${\boldsymbol \mu}=(0, 0)^T$ and covariance matrix

$$ {\boldsymbol \Sigma} = \left[ \begin{array}{cc} 1 & 0.5\\ 0.5 & 1 \end{array} \right] $$

Use simulation to find its $AD_m$.

R code:

library(MASS)

mu <- c(0, 0)
# Sigma <- matrix(c(1, 0, 0, 1), nrow = 2, ncol = 2)
Sigma <- matrix(c(1, 0.5, 0.5, 1), nrow = 2, ncol = 2)

set.seed(123456)

n_samples <- 1e4
mvn_data <- mvrnorm(n = n_samples, mu = mu, Sigma = Sigma)

x <- mvn_data[, 1]
y <- mvn_data[, 2]

plot(x, y, pch = '.')
(AD_m <- mean(sqrt(x^2 + y^2)))
## [1] 1.234322
Lingyun Zhang (张凌云)
Lingyun Zhang (张凌云)
Design Analyst

I have research interests in Statistics, applied probability and computation.