Eğer X,
Nasıl sezgisel açıklayabiliriz ( X , T X ) - 1
Eğer X,
Nasıl sezgisel açıklayabiliriz ( X , T X ) - 1
Yanıtlar:
Sabit bir terim olmadan ve tek bir regresörün örnek ortalamasına odaklandığı basit bir regresyon düşünün. Daha sonra X ′ X
Why? Because the more varying a regressor is, the more information it contains. When regressors are many, this generalizes to the inverse of their variance-covariance matrix, which takes also into account the co-variability of the regressors. In the extreme case where X′X
Görüntüleme basit bir yolu, σ 2 ( x , T x ) - 1 matrisi gibi olduğu (çoklu) analog σ 2
From either one of these formulas it may be seen that larger variability of the predictor variable will in general lead to more precise estimation of its coefficient. This is the idea often exploited in the design of experiments, where by choosing values for the (non-random) predictors, one tries to make the determinant of (XTX)
Does linear transformation of Gaussian random variable help? Using the rule that if, x∼N(μ,Σ)
Assuming, that Y=Xβ+ϵ
∴Y∼N(Xβ,σ2)XTY∼N(XTXβ,Xσ2XT)(XTX)−1XTY∼N[β,(XTX)−1σ2]
So (XTX)−1XT
Hope that was helpful.
I'll take a different approach towards developing the intuition that underlies the formula Varˆβ=σ2(X′X)−1
To help develop the intuition, we will assume that the simplest Gauss-Markov assumptions are satisfied: xi
Why should doubling the sample size, ceteris paribus, cause the variance of ˆβ
Let's turn, then, to your main question, which is about developing intuition for the claim that the variance of ˆβ
Because by assumption Varx(1)>Varx(2)
It is reasonably straightforward to generalize the intuition obtained from studying the simple regression model to the general multiple linear regression model. The main complication is that instead of comparing scalar variances, it is necessary to compare the "size" of variance-covariance matrices. Having a good working knowledge of determinants, traces and eigenvalues of real symmetric matrices comes in very handy at this point :-)
Say we have n
The covariance matrix Var(ˆβ) of the estimated parameters ˆβ1,ˆβ2 etc. is a representation of the accuracy of the estimated parameters.
If in an ideal world the data could be perfectly described by the model, then the noise will be σ2=0. Now, the diagonal entries of Var(ˆβ) correspond to Var(^β1),Var(^β2) etc. The derived formula for the variance agrees with the intuition that if the noise is lower, the estimates will be more accurate.
In addition, as the number of measurements gets larger, the variance of the estimated parameters will decrease. So, overall the absolute value of the entries of XTX will be higher, as the number of columns of XT is n and the number of rows of X is n, and each entry of XTX is a sum of n product pairs. The absolute value of the entries of the inverse (XTX)−1 will be lower.
Hence, even if there is a lot of noise, we can still reach good estimates ^βi of the parameters if we increase the sample size n.
I hope this helps.
Reference: Section 7.3 on Least squares: Cosentino, Carlo, and Declan Bates. Feedback control in systems biology. Crc Press, 2011.
This builds on @Alecos Papadopuolos' answer.
Recall that the result of a least-squares regression doesn't depend on the units of measurement of your variables. Suppose your X-variable is a length measurement, given in inches. Then rescaling X, say by multiplying by 2.54 to change the unit to centimeters, doesn't materially affect things. If you refit the model, the new regression estimate will be the old estimate divided by 2.54.
The X′X matrix is the variance of X, and hence reflects the scale of measurement of X. If you change the scale, you have to reflect this in your estimate of β, and this is done by multiplying by the inverse of X′X.