A note on xicor

Introduction

From reading [1], I know that Professor Sourav Chatterjee has published a paper about a new correlation called $xicor$, and I read [2].

Not like Pearson’s correlation $r$, which measures linear (or monotonic) relationship between random variables $X$ and $Y$, $xicor(X, Y)$ measures how much $Y$ is functionally related to $X$. Notice that $xicor(X, Y)$ is not the same as $xicor(Y, X)$.

After looking through [2], my first thought is to give $xicor$ a try on the famous data set, Anscombe’s quartet (see [3]).

R code and results

Firstly, let’s have a look at the data set:

(datasets::anscombe)
##    x1 x2 x3 x4    y1   y2    y3    y4
## 1  10 10 10  8  8.04 9.14  7.46  6.58
## 2   8  8  8  8  6.95 8.14  6.77  5.76
## 3  13 13 13  8  7.58 8.74 12.74  7.71
## 4   9  9  9  8  8.81 8.77  7.11  8.84
## 5  11 11 11  8  8.33 9.26  7.81  8.47
## 6  14 14 14  8  9.96 8.10  8.84  7.04
## 7   6  6  6  8  7.24 6.13  6.08  5.25
## 8   4  4  4 19  4.26 3.10  5.39 12.50
## 9  12 12 12  8 10.84 9.13  8.15  5.56
## 10  7  7  7  8  4.82 7.26  6.42  7.91
## 11  5  5  5  8  5.68 4.74  5.73  6.89

Next we plot the data

library(dplyr)
library(ggplot2)
library(patchwork)

# a helper function
my_plot <- function(vec1, vec2, i = 1, x_L, x_U, y_L, y_U)
{df <- data.frame(x = vec1, y = vec2)

 p <- 
   df %>% 
   ggplot(aes(x, y)) +
   geom_point() +
   geom_smooth(method = "lm", se = FALSE) +
   labs(x = paste0("x", i), y = paste0("y", i)) +
   scale_x_continuous(limits = c(x_L, x_U)) +
   scale_y_continuous(limits = c(y_L, y_U))
 p
}

# x limits
min_x <- min(anscombe$x1, anscombe$x2, anscombe$x3, anscombe$x4)
max_x <- max(anscombe$x1, anscombe$x2, anscombe$x3, anscombe$x4)

# y limits
min_y <- min(anscombe$y1, anscombe$y2, anscombe$y3, anscombe$y4)
max_y <- max(anscombe$y1, anscombe$y2, anscombe$y3, anscombe$y4)

# four pictures
p1 <- my_plot(anscombe$x1, anscombe$y1, 1, 
              x_L = min_x, x_U = max_x, y_L = min_y, y_U = max_y)
p2 <- my_plot(anscombe$x2, anscombe$y2, 2, 
              x_L = min_x, x_U = max_x, y_L = min_y, y_U = max_y)
p3 <- my_plot(anscombe$x3, anscombe$y3, 3, 
              x_L = min_x, x_U = max_x, y_L = min_y, y_U = max_y)
p4 <- my_plot(anscombe$x4, anscombe$y4, 4, 
              x_L = min_x, x_U = max_x, y_L = min_y, y_U = max_y)

(the_plot <- (p1 | p2) / (p3 | p4))

Thirdly, we find out the Pearson’s correlation $r$ for each pair of x and y in the data set.

library(purrr)
df1 <- anscombe[, 1:4]
df2 <- anscombe[, 5:8]
r <- map2_dbl(df1, df2, cor)
names(r) <- c("r1", "r2", "r3", "r4")
r
##        r1        r2        r3        r4 
## 0.8164205 0.8162365 0.8162867 0.8165214

If we keep two decimal places, then the $r$’s are all equal to $0.82$.

Finally, we calculate the $xicor$ for each pair of x and y in the data set.

library(XICOR)

xicor_xy <- rep(0, 4)
xicor_yx <- rep(0, 4)

for(i in 1:4)
{x <- anscombe[, i]
 y <- anscombe[, i + 4]
 xicor_xy[i] <- calculateXI(x, y)
 xicor_yx[i] <- calculateXI(y, x)
}  

(list(xicor_xy, xicor_yx))
## [[1]]
## [1]  0.275  0.600  0.725 -0.075
## 
## [[2]]
## [1] 0.250 0.225 0.725 0.450

References

[1] URL: https://www.r-bloggers.com/2021/12/how-to-read-sourav-chatterjees-basic-xicor-definition/

[2] Chatterjeeurl, S. A New Coefficient Of Correlation. URL: https://arxiv.org/pdf/1909.10140.pdf

[3] URL: https://en.wikipedia.org/wiki/Anscombe%27s_quartet

Lingyun Zhang (张凌云)
Lingyun Zhang (张凌云)
Design Analyst

I have research interests in Statistics, applied probability and computation.