There are (at least) three senses in which a regression can be considered "linear." To distinguish them, let's start with an extremely general regression model
Y=f(X,θ,ε).
To keep the discussion simple, take the independent variables X to be fixed and accurately measured (rather than random variables). They model n observations of p attributes each, giving rise to the n-vector of responses Y. Conventionally, X is represented as an n×p matrix and Y as a column n-vector. The (finite q-vector) θ comprises the parameters. ε is a vector-valued random variable. It usually has n components, but sometimes has fewer. The function f is vector-valued (with n components to match Y) and is usually assumed continuous in its last two arguments (θ and ε).
The archetypal example, of fitting a line to (x,y) data, is the case where X is a vector of numbers (xi,i=1,2,…,n)--the x-values; Y is a parallel vector of n numbers (yi); θ=(α,β) gives the intercept α and slope β; and ε=(ε1,ε2,…,εn) is a vector of "random errors" whose components are independent (and usually assumed to have identical but unknown distributions of mean zero). In the preceding notation,
yi=α+βxi+εi=f(X,θ,ε)i
with θ=(α,β).
The regression function may be linear in any (or all) of its three arguments:
"Linear regression, or a "linear model," ordinarily means that f is linear as a function of the parameters θ. The SAS meaning of "nonlinear regression" is in this sense, with the added assumption that f is differentiable in its second argument (the parameters). This assumption makes it easier to find solutions.
A "linear relationship between X and Y" means f is linear as a
function of X.
A model has additive errors when f is linear in ε.
In such cases it is always assumed that E(ε)=0. (Otherwise, it wouldn't be right to think of ε as
"errors" or "deviations" from "correct" values.)
Every possible combination of these characteristics can happen and is useful. Let's survey the possibilities.
A linear model of a linear relationship with additive errors. This is ordinary (multiple) regression, already exhibited above and more generally written as
Y=Xθ+ε.
X has been augmented, if necessary, by adjoining a column of constants, and θ is a p-vector.
A linear model of a nonlinear relationship with additive errors. This can be couched as a multiple regression by augmenting the columns of X with nonlinear functions of X itself. For instance,
yi=α+βx2i+ε
is of this form. It is linear in θ=(α,β); it has additive errors; and it is linear in the values (1,x2i) even though x2i is a nonlinear function of xi.
A linear model of a linear relationship with nonadditive errors. An example is multiplicative error,
yi=(α+βxi)εi.
(In such cases the εi can be interpreted as "multiplicative errors" when the location of εi is 1. However, the proper sense of location is not necessarily the expectation E(εi) anymore: it might be the median or the geometric mean, for instance. A similar comment about location assumptions applies, mutatis mutandis, in all other non-additive-error contexts too.)
A linear model of a nonlinear relationship with nonadditive errors. E.g.,
yi=(α+βx2i)εi.
A nonlinear model of a linear relationship with additive errors. A nonlinear model involves combinations of its parameters that not only are nonlinear, they cannot even be linearized by re-expressing the parameters.
As a non-example, consider
yi=αβ+β2xi+εi.
By defining α′=αβ and β′=β2, and restricting β′≥0, this model can be rewritten
yi=α′+β′xi+εi,
exhibiting it as a linear model (of a linear relationship with additive errors).
As an example, consider
yi=α+α2xi+εi.
It is impossible to find a new parameter α′, depending on α, that will linearize this as a function of α′ (while keeping it linear in xi as well).
A nonlinear model of a nonlinear relationship with additive errors.
yi=α+α2x2i+εi.
A nonlinear model of a linear relationship with nonadditive errors.
yi=(α+α2xi)εi.
A nonlinear model of a nonlinear relationship with nonadditive errors.
yi=(α+α2x2i)εi.
Although these exhibit eight distinct forms of regression, they do not constitute a classification system because some forms can be converted into others. A standard example is the conversion of a linear model with nonadditive errors (assumed to have positive support)
yi=(α+βxi)εi
into a linear model of a nonlinear relationship with additive errors via the logarithm,
log(yi)=μi+log(α+βxi)+(log(εi)−μi)
Here, the log geometric mean μi=E(log(εi)) has been removed from the error terms (to ensure they have zero means, as required) and incorporated into the other terms (where its value will need to be estimated). Indeed, one major reason to re-express the dependent variable Y is to create a model with additive errors. Re-expression can also linearize Y as a function of either (or both) of the parameters and explanatory variables.
Collinearity
Collinearity (of the column vectors in X) can be an issue in any form of regression. The key to understanding this is to recognize that collinearity leads to difficulties in estimating the parameters. Abstractly and quite generally, compare two models Y=f(X,θ,ε) and Y=f(X′,θ,ε′) where X′ is X with one column slightly changed. If this induces enormous changes in the estimates θ^ and θ^′, then obviously we have a problem. One way in which this problem can arise is in a linear model, linear in X (that is, types (1) or (5) above), where the components of θ are in one-to-one correspondence with the columns of X. When one column is a non-trivial linear combination of the others, the estimate of its corresponding parameter can be any real number at all. That is an extreme example of such sensitivity.
From this point of view it should be clear that collinearity is a potential problem for linear models of nonlinear relationships (regardless of the additivity of the errors) and that this generalized concept of collinearity is potentially a problem in any regression model. When you have redundant variables, you will have problems identifying some parameters.