HOME

TheInfoList



OR:

In statistics, the projection matrix (\mathbf), sometimes also called the influence matrix or hat matrix (\mathbf), maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). It describes the influence each response value has on each fitted value. The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation.


Definition

If the vector of response values is denoted by \mathbf and the vector of fitted values by \mathbf, :\mathbf = \mathbf \mathbf. As \mathbf is usually pronounced "y-hat", the projection matrix \mathbf is also named ''hat matrix'' as it "puts a
hat A hat is a head covering which is worn for various reasons, including protection against weather conditions, ceremonial reasons such as university graduation, religious reasons, safety, or as a fashion accessory. Hats which incorporate mech ...
on \mathbf". The element in the ''i''th row and ''j''th column of \mathbf is equal to the
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the le ...
between the ''j''th response value and the ''i''th fitted value, divided by the
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
of the former: :p_ = \frac


Application for residuals

The formula for the vector of residuals \mathbf can also be expressed compactly using the projection matrix: :\mathbf = \mathbf - \mathbf = \mathbf - \mathbf \mathbf = \left( \mathbf - \mathbf \right) \mathbf. where \mathbf is the
identity matrix In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. Terminology and notation The identity matrix is often denoted by I_n, or simply by I if the size is immaterial ...
. The matrix \mathbf \equiv \mathbf - \mathbf is sometimes referred to as the residual maker matrix or the annihilator matrix. The covariance matrix of the residuals \mathbf, by error propagation, equals :\mathbf_\mathbf = \left( \mathbf - \mathbf \right)^\textsf \mathbf \left( \mathbf-\mathbf \right), where \mathbf is the covariance matrix of the error vector (and by extension, the response vector as well). For the case of linear models with
independent and identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usua ...
errors in which \mathbf = \sigma^ \mathbf, this reduces to: :\mathbf_\mathbf = \left( \mathbf - \mathbf \right) \sigma^.


Intuition

From the figure, it is clear that the closest point from the vector \mathbf onto the column space of \mathbf, is \mathbf, and is one where we can draw a line orthogonal to the column space of \mathbf. A vector that is orthogonal to the column space of a matrix is in the nullspace of the matrix transpose, so :\mathbf^\textsf(\mathbf-\mathbf) = 0 From there, one rearranges, so :\begin && \mathbf^\textsf\mathbf &- \mathbf^\textsf\mathbf = 0 \\ \Rightarrow && \mathbf^\textsf\mathbf &= \mathbf^\textsf\mathbf \\ \Rightarrow && \mathbf &= \left(\mathbf^\textsf\mathbf\right)^\mathbf^\textsf\mathbf \end Therefore, since \mathbf is on the column space of \mathbf, the projection matrix, which maps \mathbf onto \mathbf is just \mathbf, or \mathbf\left(\mathbf^\textsf\mathbf\right)^\mathbf^\textsf


Linear model

Suppose that we wish to estimate a linear model using linear least squares. The model can be written as :\mathbf = \mathbf \boldsymbol\beta + \boldsymbol\varepsilon, where \mathbf is a matrix of explanatory variables (the design matrix), ''β'' is a vector of unknown parameters to be estimated, and ''ε'' is the error vector. Many types of models and techniques are subject to this formulation. A few examples are linear least squares,
smoothing splines Smoothing splines are function estimates, \hat f(x), obtained from a set of noisy observations y_i of the target f(x_i), in order to balance a measure of goodness of fit of \hat f(x_i) to y_i with a derivative based measure of the smoothness of ...
, regression splines,
local regression Local regression or local polynomial regression, also known as moving regression, is a generalization of the moving average and polynomial regression. Its most common methods, initially developed for scatterplot smoothing, are LOESS (locally e ...
,
kernel regression In statistics, kernel regression is a non-parametric technique to estimate the conditional expectation of a random variable. The objective is to find a non-linear relation between a pair of random variables ''X'' and ''Y''. In any nonparametr ...
, and
linear filter Linear filters process time-varying input signals to produce output signals, subject to the constraint of linearity. In most cases these linear filters are also time invariant (or shift invariant) in which case they can be analyzed exactly using ...
ing.


Ordinary least squares

When the weights for each observation are identical and the errors are uncorrelated, the estimated parameters are :\hat = \left( \mathbf^\textsf \mathbf \right)^ \mathbf^\textsf \mathbf, so the fitted values are :\hat = \mathbf \hat = \mathbf \left( \mathbf^\textsf \mathbf \right)^ \mathbf^\textsf \mathbf. Therefore, the projection matrix (and hat matrix) is given by :\mathbf \equiv \mathbf \left(\mathbf^\textsf \mathbf \right)^ \mathbf^\textsf.


Weighted and generalized least squares

The above may be generalized to the cases where the weights are not identical and/or the errors are correlated. Suppose that the covariance matrix of the errors is Σ. Then since : \hat_= \left( \mathbf^\textsf \mathbf^ \mathbf \right)^ \mathbf^\textsf \mathbf^\mathbf . the hat matrix is thus : \mathbf = \mathbf\left( \mathbf^\textsf \mathbf^ \mathbf \right)^ \mathbf^\textsf \mathbf^ and again it may be seen that H^2 = H\cdot H = H, though now it is no longer symmetric.


Properties

The projection matrix has a number of useful algebraic properties. In the language of
linear algebra Linear algebra is the branch of mathematics concerning linear equations such as: :a_1x_1+\cdots +a_nx_n=b, linear maps such as: :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matric ...
, the projection matrix is the
orthogonal projection In linear algebra and functional analysis, a projection is a linear transformation P from a vector space to itself (an endomorphism) such that P\circ P=P. That is, whenever P is applied twice to any vector, it gives the same result as if i ...
onto the
column space In linear algebra, the column space (also called the range or image) of a matrix ''A'' is the span (set of all possible linear combinations) of its column vectors. The column space of a matrix is the image or range of the corresponding mat ...
of the design matrix \mathbf. (Note that \left( \mathbf^\textsf \mathbf \right)^ \mathbf^\textsf is the pseudoinverse of X.) Some facts of the projection matrix in this setting are summarized as follows: * \mathbf = (\mathbf - \mathbf)\mathbf, and \mathbf = \mathbf - \mathbf \mathbf \perp \mathbf. * \mathbf is symmetric, and so is \mathbf \equiv \mathbf - \mathbf. * \mathbf is idempotent: \mathbf^2 = \mathbf, and so is \mathbf. * If \mathbf is an matrix with \operatorname(\mathbf) = r, then \operatorname(\mathbf) = r * The
eigenvalue In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denot ...
s of \mathbf consist of ''r'' ones and zeros, while the eigenvalues of \mathbf consist of ones and ''r'' zeros. * \mathbf is invariant under \mathbf : \mathbf = \mathbf, hence \left( \mathbf - \mathbf \right) \mathbf = \mathbf. * \left( \mathbf - \mathbf \right) \mathbf = \mathbf \left( \mathbf - \mathbf \right) = \mathbf. * \mathbf is unique for certain subspaces. The projection matrix corresponding to a
linear model In statistics, the term linear model is used in different ways according to the context. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However, the term ...
is
symmetric Symmetry (from grc, συμμετρία "agreement in dimensions, due proportion, arrangement") in everyday language refers to a sense of harmonious and beautiful proportion and balance. In mathematics, "symmetry" has a more precise definit ...
and idempotent, that is, \mathbf^2 = \mathbf. However, this is not always the case; in locally weighted scatterplot smoothing (LOESS), for example, the hat matrix is in general neither symmetric nor idempotent. For linear models, the trace of the projection matrix is equal to the rank of \mathbf, which is the number of independent parameters of the linear model. For other models such as LOESS that are still linear in the observations \mathbf, the projection matrix can be used to define the effective degrees of freedom of the model. Practical applications of the projection matrix in regression analysis include leverage and Cook's distance, which are concerned with identifying influential observations, i.e. observations which have a large effect on the results of a regression.


Blockwise formula

Suppose the design matrix X can be decomposed by columns as X = \begin A & B \end. Define the hat or projection operator as P\ = X \left(X^\textsf X \right)^ X^\textsf. Similarly, define the residual operator as M\ = I - P\. Then the projection matrix can be decomposed as follows: : P\ = P\ + P\, where, e.g., P\ = A \left(A^\textsf A \right)^ A^\textsf and M\ = I - P\. There are a number of applications of such a decomposition. In the classical application A is a column of all ones, which allows one to analyze the effects of adding an intercept term to a regression. Another use is in the fixed effects model, where A is a large
sparse matrix In numerical analysis and scientific computing, a sparse matrix or sparse array is a matrix in which most of the elements are zero. There is no strict definition regarding the proportion of zero-value elements for a matrix to qualify as sparse b ...
of the dummy variables for the fixed effect terms. One can use this partition to compute the hat matrix of X without explicitly forming the matrix X, which might be too large to fit into computer memory.


See also

*
Projection (linear algebra) In linear algebra and functional analysis, a projection is a linear transformation P from a vector space to itself (an endomorphism) such that P\circ P=P. That is, whenever P is applied twice to any vector, it gives the same result as if i ...
* Studentized residuals * Effective degrees of freedom * Mean and predicted response


References

{{Matrix classes Regression analysis Matrices