A Note on Spread of Multivariate Distriution
For a univariate random variable $X$, if we want to describe its distribution’s spread, we can choose AD (Average Deviation from the mean)
$$ E|X-\mu| $$
or variance
$$ E(X-\mu)^2. $$
Of course, when we talk about AD and variance we assume that they exist. I asked myself this question: For a bivariate (or multivariate) distribution, what are the equivalent to AD and variance?
It is common sense that for a multivariate distribution we are concerned with its covariance matrix
$$ E({\boldsymbol X}-{\boldsymbol \mu})({\boldsymbol X}-{\boldsymbol \mu})^T. $$
However, covariance matrix is not a number! If we want to have a number, which describes the spread of a multivariate distribution, then it seems to me that we can use
$$ E({\boldsymbol X}-{\boldsymbol \mu})^T({\boldsymbol X}-{\boldsymbol \mu}). $$
Note that
$$ E({\boldsymbol X}-{\boldsymbol \mu})^T({\boldsymbol X}-{\boldsymbol \mu}) = \hbox{trace}(E({\boldsymbol X}-{\boldsymbol \mu})({\boldsymbol X}-{\boldsymbol \mu})^T). $$If we want to extend $E|X-\mu|$ to multivariate distribution, then we define
$$ AD_m = E(\sqrt{({\boldsymbol X}-{\boldsymbol \mu})^T({\boldsymbol X}-{\boldsymbol \mu}})), $$
where th subscript "$m$" indicates it's for a multivariate distribution.NB: I checked my Chinese textbook on multivariate analysis, the measure of multivariate spread is given by
$$ E({\boldsymbol X}-{\boldsymbol \mu})^T{\boldsymbol \Sigma}^{-1}({\boldsymbol X}-{\boldsymbol \mu}), $$
where $ {\boldsymbol \Sigma} $ is the distribution’s covariance matrix. In this definition, the Mahalanobis distance instead of the Euclidean distance is used.
Example 1: $(X, Y)^T$ has standard bivariate normal distribution. Find the distribution’s $AD_m$.
Solution:
\begin{align} AD_m & = \frac{1}{2\pi} \int_{-\infty}^{+\infty} \int_{-\infty}^{+\infty} \sqrt{x^2+y^2}\exp{\{-(x^2+y^2)/2\}}dxdy\\ & = \frac{1}{2\pi} \int_{0}^{+\infty}\int_{0}^{2\pi} r e^{-r^2/2} r d\theta dr\\ & = \int_{0}^{+\infty} r^2 e^{-r^2/2} dr\\ & = \int_{0}^{+\infty} 2u e^{-u}\frac{\sqrt{2}}{2} u^{-1/2} du\\ & = \sqrt{2} \int_{0}^{+\infty} u^{1/2} e^{-u} du\\ & = \sqrt{2} \Gamma(3/2)\\ & = \frac{\sqrt{2}}{2} \Gamma(1/2)\\ & = \frac{\sqrt{2\pi}}{2}. \end{align}
NB: If $X \sim N(0, 1)$, then
$$ E|X| = \frac{2}{\sqrt{2\pi}}. $$
Example 2: $ (X, Y)^T $ has normal distribution with ${\boldsymbol \mu}=(0, 0)^T$ and covariance matrix
$$ {\boldsymbol \Sigma} = \left[ \begin{array}{cc} 1 & 0.5\\ 0.5 & 1 \end{array} \right] $$
Use simulation to find its $AD_m$.
R code:
library(MASS)
mu <- c(0, 0)
# Sigma <- matrix(c(1, 0, 0, 1), nrow = 2, ncol = 2)
Sigma <- matrix(c(1, 0.5, 0.5, 1), nrow = 2, ncol = 2)
set.seed(123456)
n_samples <- 1e4
mvn_data <- mvrnorm(n = n_samples, mu = mu, Sigma = Sigma)
x <- mvn_data[, 1]
y <- mvn_data[, 2]
plot(x, y, pch = '.')
(AD_m <- mean(sqrt(x^2 + y^2)))
## [1] 1.234322