Regresyon katsayısının nasıl normalleştirileceği sorusu

Normalleştirmenin burada kullanılacak doğru kelime olup olmadığından emin değilim, ancak sormaya çalıştığım şeyi göstermek için elimden geleni yapacağım. Burada kullanılan tahminci en küçük karelerdir.

Diyelim ki $y=\beta_0+\beta_1x_1$ , ortalamanın etrafında $y=\beta_0'+\beta_1x_1'$ ; burada $\beta_0'=\beta_0+\beta_1\bar x_1$ ve $x_1'=x-\bar x$ , böylece $\beta_0'$ artık tahmini üzerinde herhangi bir etkiye sahip değildir $\beta_1$ .

Bu, ortalama olarak de eşdeğerdir de . Daha az kare hesaplaması için denklemi düşürdük. $\hat\beta_1$ $y=\beta_1x_1'$ $\hat\beta_1$ $y=\beta_0+\beta_1x_1$

Bu yöntemi genel olarak nasıl uygularsınız? Şimdi modelim var, $y=\beta_1e^{x_1t}+\beta_2e^{x_2t}$ düşürmeye çalışıyorum . $y=\beta_1x'$

— Kılıç CN
kaynak

Ne tür verileri analiz ediyorsunuz ve neden modelinizden bir eş değişken olan

kaldırmak istiyorsunuz

e^{x_{1} t}

$e^{x_1t}$ ? Ayrıca, kesmeyi kaldırmanızın bir nedeni var mı? Verileri ortalamak istiyorsanız, eğim kesişmeli / kesmesiz modelde aynı olacaktır, ancak kesişmeli model verilerinize daha iyi uyacaktır.

— caburke

@caburke ben hesaplanan sonra çünkü modelin uyum endişe değilim

Ben modele geri koyabilirsiniz. Bu alıştırmanın amacı

tahmin etmektir . Orijinal denklemi sadece

düşürerek, en küçük kare hesaplaması daha kolay olacaktır (x 'bulmaya çalıştığım şeyin bir parçası,

içerebilir ). Mekanizmaları öğrenmeye çalışıyorum, bu Tukey'nin bir kitabından bir soru.

β_{1}

$\beta_1$

β_{2}

$\beta_2$

β_{1}

$\beta_1$

y = β_{1} x^{'}

$y=\beta_1x'$

e^{x_{1} t}

$e^{x_1t}$

— Sabre CN

@ca Yorumunuzun sonundaki gözlem şaşırtıcı. Doğrusal olmayan ifadeler için geçerli olmayabilir - makul bir şekilde "eğim" olarak kabul edilebilecek hiçbir şey içermezler - ancak OLS ayarında doğru değildir: ortalama merkezli veriler için uyum tam olarak bir kesme noktasıyla sığdırın. Sabre, modeliniz belirsiz:

değişkenleri hangileri ve parametreler hangileri? Amaçlanan hata yapısı nedir? (Ve Tukey'in kitaplarından hangisi soru?)

β_{1}, β_{2}, x_{1}, x_{2}, t

$\beta_1, \beta_2, x_1, x_2, t$

— whuber

@whuber Bu, Tukey'nin "Veri analizi ve regresyon: istatistiklerde ikinci bir ders" kitabından 14A.

tahmin etmeye çalıştığımız parametrelerdir,

her biri n gözlemli değişkenlerdir,

Ben gözlemlerle ilişkili zaman değişkeni olduğunu varsayıyorum, ancak belirtmedi. Hata normal olmalı ve bu soru için yok sayılabilir.

β_{1}, β_{2}

$\beta_1,\beta_2$

x_{1}, x_{2}

$x_1,x_2$

t

$t$

— Sabre CN

@whuber Çoğunlukla yazının ilk bölümüne atıfta bulunuyordum, ancak bu benim yorumumda net değildi. Demek sadece ortalama merkezli eğer olmasıydı

değil

o zaman OP önerdi ve ediliyordu gibiydi olarak, kesenini kaldırmak sonra uygun onun mutlaka durumunda beri, daha kötü olacağını

. Eğim, OP'nin son satırında belirtilen modeldeki katsayı için iyi bir terim değildir.

x

$x$

y

$y$

\bar{y} = 0

$\bar{y}=0$

— caburke

Burada soruya adalet yapamasam da - bu küçük bir monograf gerektirecekti - bazı temel fikirleri tekrar özetlemek yardımcı olabilir.

Soru

Soruyu yeniden yazarak ve açık bir terminoloji kullanarak başlayalım. Veri sipariş çiftlerinin bir listesini içerir . Adı sabit ve değerlerini belirlemek ve . İçinde bir model $(t_i, y_i)$ $\alpha_1$ $\alpha_2$ $x_{1,i} = \exp(\alpha_1 t_i)$ $x_{2,i} = \exp(\alpha_2 t_i)$

y_{i} = β_{1} x_{1, i} + β_{2} x_{2, i} + ε_{i}

$y_i = \beta_1 x_{1,i} + \beta_2 x_{2,i} + \varepsilon_i$

için sabit ve , tahmin edilecek rasgele ve - iyi bir yaklaşım için yine de - bağımsız ve (bunun tahmini ilgi de), ortak bir varyansa sahip. $\beta_1$ $\beta_2$ $\varepsilon_i$

Arka plan: doğrusal "eşleme"

Mosteller ve Tukey = ve değişkenlerini "eşleştirici" olarak adlandırır. değerlerini açıklayacağım belirli bir şekilde "eşleştirmek" için kullanılacaktır . Daha genel olarak, ve aynı Öklid vektör uzayında herhangi bir iki vektör olsun, "hedef" ve rolünü oynar $x_1$ $(x_{1,1}, x_{1,2}, \ldots)$ $x_2$ $y = (y_1, y_2, \ldots)$ $y$ $x$ $y$ $x$ "eşleştirici" nin. katsayı ile yaklaşık olarak tahmin etmek için sistematik olarak bir katsayıyı değiştirmeyi düşünüyoruz . En iyi yaklaşık mümkün olduğunca yakın olduğunda elde edilir . Eşdeğer olarak, kare uzunluğu en aza indirilir. $\lambda$ $y$ $\lambda x$ $\lambda x$ $y$ $y - \lambda x$

Bu eşleştirme işlemi görselleştirmek için bir yolu, bir dağılım grafiğini yapmaktır ve grafiğini çizildiği . Dağılım grafiği noktaları ile bu grafik arasındaki dikey mesafeler, artık vektörünün bileşenleridir ; karelerinin toplamı mümkün olduğunca küçük olmalıdır. Orantılılık sabitine kadar, bu kareler noktalara merkezlenmiş dairelerin yarıçapı artıklara eşit olan alanlarıdır: tüm bu dairelerin alanlarının toplamını en aza indirmek istiyoruz. $x$ $y$ $x \to \lambda x$ $y - \lambda x$ $(x_i, y_i)$

Orta panelde optimal değerini gösteren bir örnek : $\lambda$

Panel

Dağılım grafiğindeki noktalar mavidir; grafiği kırmızı bir çizgidir. Bu resimde kırmızı çizginin başlangıç noktasından geçmesi kısıtlandığı vurgulanmaktadır : bu çok özel bir hat bağlantısı örneğidir. $x \to \lambda x$ $(0,0)$

Ardışık eşleme ile çoklu regresyon elde edilebilir

Sorunun ayarına dönersek, bir hedef ve iki eşleştirici ve . Bu numara arama ve olan ile mümkün olduğu kadar yakın yaklaşılır daha az mesafeli anlamda. Keyfi ile başlayan , Mosteller ve Tukey eşleşen diğer değişkenler ve için $y$ $x_1$ $x_2$ $b_1$ $b_2$ $y$ $b_1 x_1 + b_2 x_2$ $x_1$ $x_2$ $y$ $x_1$ . Bu eşleşmelerin kalıntılarını sırasıyla ve olarak yazın: , değişkenten "çıkarıldığı" anlamına gelir. $x_{2\cdot 1}$ $y_{\cdot 1}$ $_{\cdot 1}$ $x_1$

Yazabiliriz

y = λ_{1} x_{1} + y_{\cdot 1} and x_{2} = λ_{2} x_{1} + x_{2 \cdot 1} .

$y = \lambda_1 x_1 + y_{\cdot 1}\text{ and }x_2 = \lambda_2 x_1 + x_{2\cdot 1}.$

Alınmış olması arasında out ve , hedef artığı eşleşecek şekilde devam eşleştirici artıklardan için . Nihai artıklar . Cebirsel olarak yazdık $x_1$ $x_2$ $y$ $y_{\cdot 1}$ $x_{2\cdot 1}$ $y_{\cdot 12}$

\begin{aligned} y_{\cdot 1} & = λ_{3} x_{2 \cdot 1} + y_{\cdot 12}; whence \\ y & = λ_{1} x_{1} + y_{\cdot 1} = λ_{1} x_{1} + λ_{3} x_{2 \cdot 1} + y_{\cdot 12} = λ_{1} x_{1} + λ_{3} (x_{2} - λ_{2} x_{1}) + y_{\cdot 12} \\ = (λ_{1} - λ_{3} λ_{2}) x_{1} + λ_{3} x_{2} + y_{\cdot 12} . \end{aligned}

$\eqalign{ y_{\cdot 1} &= \lambda_3 x_{2\cdot 1} + y_{\cdot 12}; \text{ whence} \\ y &= \lambda_1 x_1 + y_{\cdot 1} = \lambda_1 x_1 + \lambda_3 x_{2\cdot 1} + y_{\cdot 12} =\lambda_1 x_1 + \lambda_3 \left(x_2 - \lambda_2 x_1\right) + y_{\cdot 12} \\ &=\left(\lambda_1 - \lambda_3 \lambda_2\right)x_1 + \lambda_3 x_2 + y_{\cdot 12}. }$

This shows that the $\lambda_3$ in the last step is the coefficient of $x_2$ in a matching of $x_1$ and $x_2$ to $y$ .

We could just as well have proceeded by first taking $x_2$ out of $x_1$ and $y$ , producing $x_{1\cdot 2}$ and $y_{\cdot 2}$ , and then taking $x_{1\cdot 2}$ out of $y_{\cdot 2}$ , yielding a different set of residuals $y_{\cdot 21}$ . This time, the coefficient of $x_1$ found in the last step--let's call it $\mu_3$ --is the coefficient of $x_1$ in a matching of $x_1$ and $x_2$ to $y$ .

Finally, for comparison, we might run a multiple (ordinary least squares regression) of $y$ against $x_1$ and $x_2$ . Let those residuals be $y_{\cdot lm}$ . It turns out that the coefficients in this multiple regression are precisely the coefficients $\mu_3$ and $\lambda_3$ found previously and that all three sets of residuals, $y_{\cdot 12}$ , $y_{\cdot 21}$ , and $y_{\cdot lm}$ , are identical.

Depicting the process

None of this is new: it's all in the text. I would like to offer a pictorial analysis, using a scatterplot matrix of everything we have obtained so far.

Scatterplot

Because these data are simulated, we have the luxury of showing the underlying "true" values of $y$ on the last row and column: these are the values $\beta_1 x_1 + \beta_2 x_2$ without the error added in.

The scatterplots below the diagonal have been decorated with the graphs of the matchers, exactly as in the first figure. Graphs with zero slopes are drawn in red: these indicate situations where the matcher gives us nothing new; the residuals are the same as the target. Also, for reference, the origin (wherever it appears within a plot) is shown as an open red circle: recall that all possible matching lines have to pass through this point.

Much can be learned about regression through studying this plot. Some of the highlights are:

The matching of $x_2$ to $x_1$ (row 2, column 1) is poor. This is a good thing: it indicates that $x_1$ and $x_2$ are providing very different information; using both together will likely be a much better fit to $y$ than using either one alone.
Once a variable has been taken out of a target, it does no good to try to take that variable out again: the best matching line will be zero. See the scatterplots for $x_{2\cdot 1}$ versus $x_1$ or $y_{\cdot 1}$ versus $x_1$ , for instance.
The values $x_1$ , $x_2$ , $x_{1\cdot 2}$ , and $x_{2\cdot 1}$ have all been taken out of $y_{\cdot lm}$ .
Multiple regression of $y$ against $x_1$ and $x_2$ can be achieved first by computing $y_{\cdot 1}$ and $x_{2\cdot 1}$ . These scatterplots appear at (row, column) = $(8,1)$ and $(2,1)$ , respectively. With these residuals in hand, we look at their scatterplot at $(4,3)$ . These three one-variable regressions do the trick. As Mosteller & Tukey explain, the standard errors of the coefficients can be obtained almost as easily from these regressions, too--but that's not the topic of this question, so I will stop here.

Code

These data were (reproducibly) created in R with a simulation. The analyses, checks, and plots were also produced with R. This is the code.

#
# Simulate the data.
#
set.seed(17)
t.var <- 1:50                                    # The "times" t[i]
x <- exp(t.var %o% c(x1=-0.1, x2=0.025) )        # The two "matchers" x[1,] and x[2,]
beta <- c(5, -1)                                 # The (unknown) coefficients
sigma <- 1/2                                     # Standard deviation of the errors
error <- sigma * rnorm(length(t.var))            # Simulated errors
y <- (y.true <- as.vector(x %*% beta)) + error   # True and simulated y values
data <- data.frame(t.var, x, y, y.true)

par(col="Black", bty="o", lty=0, pch=1)
pairs(data)                                      # Get a close look at the data
#
# Take out the various matchers.
#
take.out <- function(y, x) {fit <- lm(y ~ x - 1); resid(fit)}
data <- transform(transform(data, 
  x2.1 = take.out(x2, x1),
  y.1 = take.out(y, x1),
  x1.2 = take.out(x1, x2),
  y.2 = take.out(y, x2)
), 
  y.21 = take.out(y.2, x1.2),
  y.12 = take.out(y.1, x2.1)
)
data$y.lm <- resid(lm(y ~ x - 1))               # Multiple regression for comparison
#
# Analysis.
#
# Reorder the dataframe (for presentation):
data <- data[c(1:3, 5:12, 4)]

# Confirm that the three ways to obtain the fit are the same:
pairs(subset(data, select=c(y.12, y.21, y.lm)))

# Explore what happened:
panel.lm <- function (x, y, col=par("col"), bg=NA, pch=par("pch"),
   cex=1, col.smooth="red",  ...) {
  box(col="Gray", bty="o")
  ok <- is.finite(x) & is.finite(y)
  if (any(ok))  {
    b <- coef(lm(y[ok] ~ x[ok] - 1))
    col0 <- ifelse(abs(b) < 10^-8, "Red", "Blue")
    lwd0 <- ifelse(abs(b) < 10^-8, 3, 2)
    abline(c(0, b), col=col0, lwd=lwd0)
  }
  points(x, y, pch = pch, col="Black", bg = bg, cex = cex)    
  points(matrix(c(0,0), nrow=1), col="Red", pch=1)
}
panel.hist <- function(x, ...) {
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(usr[1:2], 0, 1.5) )
  h <- hist(x, plot = FALSE)
  breaks <- h$breaks; nB <- length(breaks)
  y <- h$counts; y <- y/max(y)
  rect(breaks[-nB], 0, breaks[-1], y,  ...)
}
par(lty=1, pch=19, col="Gray")
pairs(subset(data, select=c(-t.var, -y.12, -y.21)), col="Gray", cex=0.8, 
   lower.panel=panel.lm, diag.panel=panel.hist)

# Additional interesting plots:
par(col="Black", pch=1)
#pairs(subset(data, select=c(-t.var, -x1.2, -y.2, -y.21)))
#pairs(subset(data, select=c(-t.var, -x1, -x2)))
#pairs(subset(data, select=c(x2.1, y.1, y.12)))

# Details of the variances, showing how to obtain multiple regression
# standard errors from the OLS matches.
norm <- function(x) sqrt(sum(x * x))
lapply(data, norm)
s <- summary(lm(y ~ x1 + x2 - 1, data=data))
c(s$sigma, s$coefficients["x1", "Std. Error"] * norm(data$x1.2)) # Equal
c(s$sigma, s$coefficients["x2", "Std. Error"] * norm(data$x2.1)) # Equal
c(s$sigma, norm(data$y.12) / sqrt(length(data$y.12) - 2))        # Equal

— whuber
kaynak

Could multiple regression of

y

$y$ against

x_{1}

$x_1$ and

x_{2}

$x_2$ still be achieved by first computing

y_{.1}

$y_{.1}$ and

x_{2.1}

$x_{2.1}$ if

x_{1}

$x_1$ and

x_{2}

$x_2$ were correlated? Wouldn't it then make a big difference whether we sequentially regressed

y

$y$ on

x_{1}

$x_1$ and

x_{2.1}

$x_{2.1}$ or on

x_{2}

$x_2$ and

x_{1.2}

$x_{1.2}$ ? How does this relate to one regression equation with multiple explanatory variables?

— miura

@miura, One of the leitmotifs of that chapter in Mosteller & Tukey is that when the

x_{i}

$x_i$ are correlated, the partials

x_{i \cdot j}

$x_{i\cdot j}$ have low variances; because their variances appear in the denominator of a formula for the estimation variance of their coefficients, this implies the corresponding coefficients will have relatively uncertain estimates. That's a fact of the data, M&T say, and you need to recognize that. It makes no difference whether you start the regression with

x_{1}

$x_1$ or

x_{2}

$x_2$ : compare y.21 to y.12 in my code.

— whuber

I came across this today, here is what I think on the question by @miura, Think of a 2 dimensional space where Y is to be projected as a combination of two vectors. y = ax1 + bx2 + res (=0). Now think of y as a combination of 3 variables, y = ax1 + bx2 + cx3. and x3 = mx1 + nx2. so certainly, the order in which you choose your variables is going to effect the coefficients. The reason for this is: the minimum error here can be obtained by various combinations. However, in few examples, the minimum error can be obtained by only one combination and that is where the order will not matter.

— Gaurav Singhal

@whuber Can you elaborate on how this equation might be used for a multivariate regression that also has a constant term ? ie y = B1 * x1 + B2 * x2 + c ? It is not clear to me how the constant term can be derived. Also I understand in general what was done for the 2 variables, enough at least to replicate it in Excel. How can that be expanded to 3 variables ? x1, x2, x3. It seems clear that we would need to remove x3 first from y, x1, and x2. then remove x2 from x1 and y. But it is not clear to me how to then get the B3 term.

— Fairly Nerdy

I have answered some of my questions I have in the comment above. For a 3 variable regression, we would have 6 steps. Remove x1 from x2, from x3, and from y. Then remove x2,1 from x3,1 and from y1. Then remove x3,21 from y21. That results in 6 equations, each of which is of the form variable = lamda * different variable + residual. One of those equations has a y as the first variable, and if you just keep substituting the other variables in, you get the equation you need

— Fairly Nerdy