3 değişken için Pearson korelasyonunun analojisi


17

Üç değişkenin bir “korelasyonunun” bir şey olup olmadığıyla ilgileniyorum ve ne olursa olsun, bu ne olurdu?

Pearson ürün moment korelasyon katsayısı

E { ( X - μ X ) ( Y - μ Y ) }V a r ( X ) V a r ( Y )

E{(XμX)(YμY)}Var(X)Var(Y)

Şimdi 3 değişken için soru:

E { ( X - μ X ) ( Y - μ Y ) ( Z - μ Z ) }V a r ( X ) V a r ( Y ) V a r ( Z )

E{(XμX)(YμY)(ZμZ)}Var(X)Var(Y)Var(Z)

herhangi bir şey?

R'de yorumlanabilir bir şey gibi görünüyor:

> a <- rnorm(100); b <- rnorm(100); c <- rnorm(100)
> mean((a-mean(a)) * (b-mean(b)) * (c-mean(c))) / (sd(a) * sd(b) * sd(c))
[1] -0.3476942

Normalde sabit bir üçüncü değişkenin değeri verilen 2 değişken arasındaki korelasyona bakarız. Birisi açıklığa kavuşturabilir mi?


2
1) İki değişkenli Pearson formülünüzde, "E" (kodunuzda ortalama), n'ye sonra bölünme anlamına gelirse, st. sapmalar da n'ye dayanmalıdır (n-1 değil). 2) Üç değişkenin de aynı değişken olmasına izin verin. Bu durumda, korelasyonun 1 (iki değişkenli durumda olduğu gibi) olmasını bekliyoruz, ama ne yazık ki ...
ttnphns

Üç değişkenli normal dağılım için korelasyonların ne olduğuna bakılmaksızın sıfırdır.
Ray Koopman

1
Gerçekten başlığın "3 değişken için Pearson korelasyonunun analojisi" veya benzeri olarak değiştirilmesinden fayda sağlayacağını düşünüyorum - buradaki bağlantıları daha bilgilendirici hale getirecek
Silverfish

1
@Silverfish Kabul ediyorum! Başlığı güncelledim, teşekkürler.
PascalVKooten

Yanıtlar:


12

O ise gerçekten bir şey. Öğrenmek için korelasyonun kendisi hakkında bildiklerimizi incelememiz gerekir.

  1. Bir vektör değerli rastgele değişkenin ilişki matrisi X = ( X 1 , x 2 , ... , x s )X=(X1,X2,,Xp) arasında standart versiyonun bir varyans-kovaryans matrisi veya sadece "varyans," XX . Diğer bir deyişle, her X i'ninXi yerine yeniden girilmiş, yeniden ölçeklendirilmiş versiyonu gelir.

  2. Kovaryans X iXi ve X, jXj kendi merkezli alternatifler ürünü beklentisi olan. Yani, X i = X i - E [ X i ]Xi=XiE[Xi] ve X j = X j - E [ X j ] yazdığımızda Xj=XjE[Xj],

    Cov ( X i , X j ) = E [ X i X j ] .

    Cov(Xi,Xj)=E[XiXj].
  3. Var ( X ) yazacağım X'in varyansı tek bir sayı değil. Var ( X ) i j = Cov ( X i , X j değer dizisidir. XVar(X) ) .

    Var(X)ij=Cov(Xi,Xj).
  4. Amaçlanan genelleme için kovaryansı düşünmenin yolu, bunu bir tensör olarak düşünmektir . Bu , X ve doğrusal bir dönüşüm geçirdiğinde değerleri özellikle basit bir öngörülebilir şekilde değişen, 1 ile p arasında değişen ve i ve j ile endekslenen v i j miktarlarının bir koleksiyonudur . Spesifik olarak, izin Y = ( Y 1 , Y, 2 , ... , Y, q ) ile tanımlanan bir vektör değerli rastgele değişkenvijij1pXY=(Y1,Y2,,Yq)

    Yi=pj=1ajiXj.

    Yi=j=1pajiXj.

    The constants ajiaji (ii and jj are indexes--jj is not a power) form a q×pq×p array A=(aji)A=(aji), j=1,,pj=1,,p and i=1,,qi=1,,q. The linearity of expectation implies

    Var(Y)ij=akialjVar(X)kl.

    Var(Y)ij=akialjVar(X)kl.

    In matrix notation,

    Var(Y)=AVar(X)A.

    Var(Y)=AVar(X)A.
  5. All the components of Var(X)Var(X) actually are univariate variances, due to the Polarization Identity

    4Cov(Xi,Xj)=Var(Xi+Xj)Var(XiXj).

    4Cov(Xi,Xj)=Var(Xi+Xj)Var(XiXj).

    This tells us that if you understand variances of univariate random variables, you already understand covariances of bivariate variables: they are "just" linear combinations of variances.


The expression in the question is perfectly analogous: the variables XiXi have been standardized as in (1)(1). We can understand what it represents by considering what it means for any variable, standardized or not. We would replaced each XiXi by its centered version, as in (2)(2), and form quantities having three indexes,

μ3(X)ijk=E[XiXjXk].

μ3(X)ijk=E[XiXjXk].

These are the central (multivariate) moments of degree 33. As in (4)(4), they form a tensor: when Y=AXY=AX, then

μ3(Y)ijk=l,m,naliamjankμ3(X)lmn.

μ3(Y)ijk=l,m,naliamjankμ3(X)lmn.

The indexes in this triple sum range over all combinations of integers from 11 through pp.

The analog of the Polarization Identity is

24μ3(X)ijk=μ3(Xi+Xj+Xk)μ3(XiXj+Xk)μ3(Xi+XjXk)+μ3(XiXjXk).

24μ3(X)ijk=μ3(Xi+Xj+Xk)μ3(XiXj+Xk)μ3(Xi+XjXk)+μ3(XiXjXk).

On the right hand side, μ3μ3 refers to the (univariate) central third moment: the expected value of the cube of the centered variable. When the variables are standardized, this moment is usually called the skewness. Accordingly, we may think of μ3(X)μ3(X) as being the multivariate skewness of XX. It is a tensor of rank three (that is, with three indices) whose values are linear combinations of the skewnesses of various sums and differences of the XiXi. If we were to seek interpretations, then, we would think of these components as measuring in pp dimensions whatever the skewness is measuring in one dimension. In many cases,

  • The first moments measure the location of a distribution;

  • The second moments (the variance-covariance matrix) measure its spread;

  • The standardized second moments (the correlations) indicate how the spread varies in pp-dimensional space; and

  • The standardized third and fourth moments are taken to measure the shape of a distribution relative to its spread.

To elaborate on what a multidimensional "shape" might mean, observed that we can understand PCA as a mechanism to reduce any multivariate distribution to a standard version located at the origin and equal spreads in all directions. After PCA is performed, then, μ3μ3 would provide the simplest indicators of the multidimensional shape of the distribution. These ideas apply equally well to data as to random variables, because data can always be analyzed in terms of their empirical distribution.


Reference

Alan Stuart & J. Keith Ord, Kendall's Advanced Theory of Statistics Fifth Edition, Volume 1: Distribution Theory; Chapter 3, Moments and Cumulants. Oxford University Press (1987).


Appendix: Proof of the Polarization Identity

Let x1,,xnx1,,xn be algebraic variables. There are 2n2n ways to add and subtract all nn of them. When we raise each of these sums-and-differences to the nthnth power, pick a suitable sign for each of those results, and add them up, we will get a multiple of x1x2xnx1x2xn.

More formally, let S={1,1}nS={1,1}n be the set of all nn-tuples of ±1±1, so that any element sSsS is a vector s=(s1,s2,,sn)s=(s1,s2,,sn) whose coefficients are all ±1±1. The claim is

2nn!x1x2xn=sSs1s2sn(s1x1+s2x2++snxn)n.

2nn!x1x2xn=sSs1s2sn(s1x1+s2x2++snxn)n.(1)

Indeed, the Multinomial Theorem states that the coefficient of the monomial xi11xi22xinnxi11xi22xinn (where the ijij are nonnegative integers summing to nn) in the expansion of any term on the right hand side is

(ni1,i2,,in)si11si22sinn.

(ni1,i2,,in)si11si22sinn.

In the sum (1)(1), the coefficients involving xi11xi11 appear in pairs where one of each pair involves the case s1=1s1=1, with coefficient proportional to s1s1 times si11si11, equal to 11, and the other of each pair involves the case s1=1s1=1, with coefficient proportional to 11 times (1)i1(1)i1, equal to (1)i1+1(1)i1+1. They cancel in the sum whenever i1+1i1+1 is odd. The same argument applies to i2,,ini2,,in. Consequently, the only monomials that occur with nonzero coefficients must have odd powers of all the xixi. The only such monomial is x1x2xnx1x2xn. It appears with coefficient (n1,1,,1)=n!(n1,1,,1)=n! in all 2n2n terms of the sum. Consequently its coefficient is 2nn!2nn!, QED.

We need take only half of each pair associated with x1x1: that is, we can restrict the right hand side of (1)(1) to the terms with s1=1s1=1 and halve the coefficient on the left hand side to 2n1n!2n1n! . That gives precisely the two versions of the Polarization Identity quoted in this answer for the cases n=2n=2 and n=3n=3: 2212!=42212!=4 and 2313!=242313!=24.

Of course the Polarization Identity for algebraic variables immediately implies it for random variables: let each xixi be a random variable XiXi. Take expectations of both sides. The result follows by linearity of expectation.


Well done on explaining so far! Multivariate skewness kind of makes sense. Could you perhaps add an example that would show the importance of this multivariate skewness? Either as an issue in a statistical models, or perhaps more interesting, what real life case would be subject to multivariate skewness :)?
PascalVKooten

3

Hmmm. If we run...

a <- rnorm(100);
b <- rnorm(100);
c <- rnorm(100)
mean((a-mean(a))*(b-mean(b))*(c-mean(c)))/
  (sd(a) * sd(b) * sd(c))

it does seem to center on 0 (I haven't done a real simulation), but as @ttnphns alludes, running this (all variables the same)

a <- rnorm(100)
mean((a-mean(a))*(a-mean(a))*(a-mean(a)))/
  (sd(a) * sd(a) * sd(a))

also seems to center on 0, which certainly makes me wonder what use this could be.


2
The nonsense apparently comes from the fact that sd or variance is a function of squaring, as is covariance. But with 3 variables, cubing occurs in the numerator while denominator remains based on originally squared terms
ttnphns

2
Is that the root of it (pun intended)? Numerator and denominator have the same dimensions and units, which cancel, so that alone doesn't make the measure poorly formed.
Nick Cox

3
@Nick That's right. This is simply one of the multivariate central third moments. It is one component of a rank-three tensor giving the full set of third moments (which is closely related to the order-3 component of the multivariate cumulant generating function). In conjunction with the other components it could be of some use in describing asymmetries (higher-dimensional "skewness") in the distribution. It's not what anyone would call a "correlation," though: almost by definition, a correlation is a second-order property of the standardized variable.
whuber

1

If You need to calculate "correlation" between three or more variables, you could not use Pearson, as in this case it will be different for different order of variables have a look here. If you are interesting in linear dependency, or how well they are fitted by 3D line, you may use PCA, obtain explained variance for first PC, permute your data and find probability, that this value may be to to random reasons. I've discuss something similar here (see Technical details below).

Matlab code

% Simulate our experimental data
x=normrnd(0,1,100,1);
y=2*x.*normrnd(1,0.1,100,1);
z=(-3*x+1.5*y).*normrnd(1,2,100,1);
% perform pca
[loadings, scores,variance]=pca([x,y,z]);
% Observed Explained Variance for first principal component
OEV1=variance(1)/sum(variance)
% perform permutations
permOEV1=[];
for iPermutation=1:1000
    permX=datasample(x,numel(x),'replace',false);
    permY=datasample(y,numel(y),'replace',false);
    permZ=datasample(z,numel(z),'replace',false);
    [loadings, scores,variance]=pca([permX,permY,permZ]);
    permOEV1(end+1)=variance(1)/sum(variance);
end

% Calculate p-value
p_value=sum(permOEV1>=OEV1)/(numel(permOEV1)+1)
Sitemizi kullandığınızda şunları okuyup anladığınızı kabul etmiş olursunuz: Çerez Politikası ve Gizlilik Politikası.
Licensed under cc by-sa 3.0 with attribution required.