En sezgiyi sağlamaya çalışmak için en basit durumu ele alalım. Let ile ayrı bir dağılımından bir iid örnek olarak k sonuçları. Let π 1 , ... , π k her bir sonucun olasılıkları olun. Ki kare istatistiği X 2 = k ∑ i = 1 ( As i - n π i ) 2'nin (asimtotik) dağılımı ile ilgileniyoruz
X1,X2,…,Xnkπ1,…,πk
X2=∑i=1k(Si−nπi)2nπi.
Here
nπi is the expected number of counts of the
ith outcome.
A suggestive heuristic
Define Ui=(Si−nπi)/nπi−−−√, so that X2=∑iU2i=∥U∥22 where U=(U1,…,Uk).
Since Si is Bin(n,πi), then by the Central Limit Theorem,
Ti=Ui1−πi−−−−−√=Si−nπinπi(1−πi)−−−−−−−−−√→dN(0,1),
hence, we also have that,
Ui→dN(0,1−πi).
Now, if the Ti were (asymptotically) independent (which they aren't), then we could argue that
∑iT2i was asymptotically χ2k distributed. But, note that Tk is a deterministic function of (T1,…,Tk−1) and so the Ti variables can't possibly be independent.
Hence, we must take into account the covariance between them somehow. It turns out that the "correct" way to do this is to use the Ui instead, and the covariance between the components of U also changes the asymptotic distribution from what we might have thought was χ2k to what is, in fact, a χ2k−1.
Some details on this follow.
A more rigorous treatment
It is not hard to check that, in fact,
Cov(Ui,Uj)=−πiπj−−−−√ for i≠j.
So, the covariance of U is
A=I−π−−√π−−√T,
where
π−−√=(π1−−√,…,πk−−√). Note that
A is symmetric and idempotent, i.e.,
A=A2=AT. So, in particular, if
Z=(Z1,…,Zk) has iid standard normal components, then
AZ∼N(0,A). (
NB The multivariate normal distribution in this case is
degenerate.)
Now, by the Multivariate Central Limit Theorem, the vector U has
an asymptotic multivariate normal distribution with mean 0 and
covariance A.
So, U has the same asymptotic distribution as AZ, hence, the same asymptotic distribution of
X2=UTU is the same as the distribution of ZTATAZ=ZTAZ by the continuous mapping theorem.
But, A is symmetric and idempotent, so (a) it has orthogonal
eigenvectors, (b) all of its eigenvalues are 0 or 1, and (c)
the multiplicity of the eigenvalue of 1 is rank(A). This means that A can be decomposed as A=QDQT where Q is orthogonal and D is a diagonal matrix with rank(A) ones on the diagonal and the remaining diagonal entries being zero.
Thus, ZTAZ must be χ2k−1 distributed since
A has rank k−1 in our case.
Other connections
The chi-square statistic is also closely related to likelihood ratio
statistics. Indeed, it is a Rao score statistic and can be viewed as a
Taylor-series approximation of the likelihood ratio statistic.
References
This is my own development based on experience, but obviously influenced by classical texts. Good places to look to learn more are
- G. A. F. Seber and A. J. Lee (2003), Linear Regression Analysis, 2nd ed., Wiley.
- E. Lehmann and J. Romano (2005), Testing Statistical Hypotheses, 3rd ed., Springer. Section 14.3 in particular.
- D. R. Cox and D. V. Hinkley (1979), Theoretical Statistics, Chapman and Hall.