In statistics and

signal processing Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing '' signals'', such as sound, images, and scientific measurements. Signal processing techniques are used to optimize transmissions, ...

, the orthogonality principle is a

necessary and sufficient In logic and mathematics, necessity and sufficiency are terms used to describe a conditional or implicational relationship between two statements. For example, in the conditional statement: "If then ", is necessary for , because the truth of ...

condition for the optimality of a

Bayesian estimator In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function (i.e., the posterior expected loss). Equivalently, it maximizes the pos ...

. Loosely stated, the orthogonality principle says that the error vector of the optimal estimator (in a mean square error sense) is orthogonal to any possible estimator. The orthogonality principle is most commonly stated for linear estimators, but more general formulations are possible. Since the principle is a necessary and sufficient condition for optimality, it can be used to find the

minimum mean square error In statistics and signal processing, a minimum mean square error (MMSE) estimator is an estimation method which minimizes the mean square error (MSE), which is a common measure of estimator quality, of the fitted values of a dependent variable. I ...

estimator.

Orthogonality principle for linear estimators

The orthogonality principle is most commonly used in the setting of linear estimation. In this context, let ''x'' be an unknown

random vector In probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its valu ...

which is to be estimated based on the observation vector ''y''. One wishes to construct a linear estimator

\hat = Hy + c

for some matrix ''H'' and vector ''c''. Then, the orthogonality principle states that an estimator

\hat

achieves

if and only if *

\operatorname \ = 0,

and *

\operatorname \ = 0.

If ''x'' and ''y'' have zero mean, then it suffices to require the first condition.

Example

Suppose ''x'' is a

Gaussian random variable In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu is ...

with mean ''m'' and variance

\sigma_x^2.

Also suppose we observe a value

y=x+w,

where ''w'' is Gaussian noise which is independent of ''x'' and has mean 0 and variance

\sigma_w^2.

We wish to find a linear estimator

\hat = hy+c

minimizing the MSE. Substituting the expression

\hat = hy + c

into the two requirements of the orthogonality principle, we obtain :

0 = \operatorname \

0 = \operatorname \

0 = h (\sigma_x^2+\sigma_w^2) + hm^2 + cm - \sigma_x^2 -m^2

and :

0 = \operatorname \

0 = \operatorname \

0 = (h-1)m + c .

Solving these two linear equations for ''h'' and ''c'' results in :

h = \frac, \quad c = \frac m ,

so that the linear minimum mean square error estimator is given by :

\hat = \frac y + \frac m.

This estimator can be interpreted as a weighted average between the noisy measurements ''y'' and the prior expected value ''m''. If the noise variance

\sigma_w^2

is low compared with the variance of the prior

\sigma_x^2

(corresponding to a high SNR), then most of the weight is given to the measurements ''y'', which are deemed more reliable than the prior information. Conversely, if the noise variance is relatively higher, then the estimate will be close to ''m'', as the measurements are not reliable enough to outweigh the prior information. Finally, note that because the variables ''x'' and ''y'' are jointly Gaussian, the minimum MSE estimator is linear.See the article

. Therefore, in this case, the estimator above minimizes the MSE among all estimators, not only linear estimators.

General formulation

Let

V

be a

Hilbert space In mathematics, Hilbert spaces (named after David Hilbert) allow generalizing the methods of linear algebra and calculus from (finite-dimensional) Euclidean vector spaces to spaces that may be infinite-dimensional. Hilbert spaces arise natu ...

of random variables with an

inner product In mathematics, an inner product space (or, rarely, a Hausdorff pre-Hilbert space) is a real vector space or a complex vector space with an operation called an inner product. The inner product of two vectors in the space is a scalar, often ...

defined by

\langle x,y \rangle = \operatorname \

. Suppose

W

is a closed subspace of

V

, representing the space of all possible estimators. One wishes to find a vector

\hat \in W

which will approximate a vector

x \in V

. More accurately, one would like to minimize the mean squared error (MSE)

\operatorname \,  x - \hat \, ^2

between

\hat

and

x

. In the special case of linear estimators described above, the space

V

is the set of all functions of

x

and

y

, while

W

is the set of linear estimators, i.e., linear functions of

y

only. Other settings which can be formulated in this way include the subspace of

causal Causality (also referred to as causation, or cause and effect) is influence by which one event, process, state, or object (''a'' ''cause'') contributes to the production of another event, process, state, or object (an ''effect'') where the ca ...

linear filters and the subspace of all (possibly nonlinear) estimators. Geometrically, we can see this problem by the following simple case where

W

is a

one-dimensional In physics and mathematics, a Euclidean vector, sequence of ''n'' Real number, numbers can specify a Point (geometry), location in ''n''-dimensional space. When , the set of all such locations is called a one-dimensional space. An example of a on ...

subspace: We want to find the closest approximation to the vector

x

by a vector

\hat

in the space

W

. From the geometric interpretation, it is intuitive that the best approximation, or smallest error, occurs when the error vector,

e

, is orthogonal to vectors in the space

W

. More accurately, the general orthogonality principle states the following: Given a closed subspace

W

of estimators within a Hilbert space

V

and an element

x

V

, an element

\hat \in W

achieves minimum MSE among all elements in

W

if and only if

\operatorname \ = 0

for all

y \in W.

Stated in such a manner, this principle is simply a statement of the

Hilbert projection theorem In mathematics, the Hilbert projection theorem is a famous result of convex analysis that says that for every vector x in a Hilbert space In mathematics, Hilbert spaces (named after David Hilbert) allow generalizing the methods of linear a ...

. Nevertheless, the extensive use of this result in signal processing has resulted in the name "orthogonality principle."

A solution to error minimization problems

The following is one way to find the

estimator by using the orthogonality principle. We want to be able to approximate a vector

x

by :

x=\hat+e\,

where :

\hat=\sum_i c_p_

is the approximation of

x

as a linear combination of vectors in the subspace

W

spanned by

p_,p_,\ldots.

Therefore, we want to be able to solve for the coefficients,

c_

, so that we may write our approximation in known terms. By the orthogonality theorem, the square norm of the error vector,

\left\Vert e\right\Vert ^

, is minimized when, for all ''j'', :

\left\langle x-\sum_i c_p_,p_\right\rangle =0.

Developing this equation, we obtain :

\left\langle x,p_\right\rangle =\left\langle \sum_i c_p_,p_\right\rangle =\sum_i c_\left\langle p_,p_\right\rangle.

If there is a finite number

n

of vectors

p_i

, one can write this equation in matrix form as :

\begin
\left\langle x,p_\right\rangle \\
\left\langle x,p_\right\rangle \\
\vdots\\
\left\langle x,p_\right\rangle \end
=
\begin
\left\langle p_,p_\right\rangle  & \left\langle p_,p_\right\rangle  & \cdots & \left\langle p_,p_\right\rangle \\
\left\langle p_,p_\right\rangle  & \left\langle p_,p_\right\rangle  & \cdots & \left\langle p_,p_\right\rangle \\
\vdots & \vdots & \ddots & \vdots\\
\left\langle p_,p_\right\rangle  & \left\langle p_,p_\right\rangle  & \cdots & \left\langle p_,p_\right\rangle \end
\begin
c_\\
c_\\
\vdots\\
c_\end.

Assuming the

p_i

are

linearly independent In the theory of vector spaces, a set of vectors is said to be if there is a nontrivial linear combination of the vectors that equals the zero vector. If no such linear combination exists, then the vectors are said to be . These concepts ...

, the

Gramian matrix In linear algebra, the Gram matrix (or Gramian matrix, Gramian) of a set of vectors v_1,\dots, v_n in an inner product space is the Hermitian matrix of inner products, whose entries are given by the inner product G_ = \left\langle v_i, v_j \right\r ...

can be inverted to obtain :

\begin
c_\\
c_\\
\vdots\\
c_\end
=
\begin
\left\langle p_,p_\right\rangle  & \left\langle p_,p_\right\rangle  & \cdots & \left\langle p_,p_\right\rangle \\
\left\langle p_,p_\right\rangle  & \left\langle p_,p_\right\rangle  & \cdots & \left\langle p_,p_\right\rangle \\
\vdots & \vdots & \ddots & \vdots\\
\left\langle p_,p_\right\rangle  & \left\langle p_,p_\right\rangle  & \cdots & \left\langle p_,p_\right\rangle \end^
\begin
\left\langle x,p_\right\rangle \\
\left\langle x,p_\right\rangle \\
\vdots\\
\left\langle x,p_\right\rangle \end,

thus providing an expression for the coefficients

c_i

of the minimum mean square error estimator.

Notes

References