A whitening transformation or sphering transformation is a

linear transformation In mathematics, and more specifically in linear algebra, a linear map (also called a linear mapping, linear transformation, vector space homomorphism, or in some contexts linear function) is a mapping V \to W between two vector spaces that pr ...

that transforms a vector of

random variables A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' in its mathematical definition refers ...

with a known

covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...

into a set of new variables whose covariance is the

identity matrix In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. It has unique properties, for example when the identity matrix represents a geometric transformation, the obje ...

, meaning that they are

uncorrelated In probability theory and statistics, two real-valued random variables, X, Y, are said to be uncorrelated if their covariance, \operatorname ,Y= \operatorname Y- \operatorname \operatorname /math>, is zero. If two variables are uncorrelated, ther ...

and each have

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

1. The transformation is called "whitening" because it changes the input vector into a

white noise In signal processing, white noise is a random signal having equal intensity at different frequencies, giving it a constant power spectral density. The term is used with this or similar meanings in many scientific and technical disciplines, i ...

vector. Several other transformations are closely related to whitening: # the decorrelation transform removes only the correlations but leaves variances intact, # the standardization transform sets variances to 1 but leaves correlations intact, # a coloring transformation transforms a vector of white random variables into a random vector with a specified covariance matrix.

Definition

Suppose

X

is a random (column) vector with non-singular covariance matrix

\Sigma

and mean

0

. Then the transformation

Y = W X

with a whitening matrix

W

satisfying the condition

W^\mathrm W = \Sigma^

yields the whitened random vector

Y

with unit diagonal covariance. If

X

has non-zero mean

\mu

, then whitening can be performed by

Y = W (X - \mu)

. There are infinitely many possible whitening matrices

W

that all satisfy the above condition. Commonly used choices are

W = \Sigma^

(Mahalanobis or ZCA whitening),

W = L^T

where

L

is the

Cholesky decomposition In linear algebra, the Cholesky decomposition or Cholesky factorization (pronounced ) is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for eff ...

\Sigma^

(Cholesky whitening), or the eigen-system of

\Sigma

(PCA whitening). Optimal whitening transforms can be singled out by investigating the cross-covariance and cross-correlation of

X

and

Y

. For example, the unique optimal whitening transformation achieving maximal component-wise correlation between original

X

and whitened

Y

is produced by the whitening matrix

W = P^ V^

where

P

is the correlation matrix and

V

the diagonal variance matrix.

Whitening a data matrix

Whitening a data matrix follows the same transformation as for random variables. An empirical whitening transform is obtained by estimating the covariance (e.g. by

maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...

) and subsequently constructing a corresponding estimated whitening matrix (e.g. by

High-dimensional whitening

This modality is a generalization of the pre-whitening procedure extended to more general spaces where

X

is usually assumed to be a random function or other random objects in a

Hilbert space In mathematics, a Hilbert space is a real number, real or complex number, complex inner product space that is also a complete metric space with respect to the metric induced by the inner product. It generalizes the notion of Euclidean space. The ...

H

. One of the main issues of extending whitening to infinite dimensions is that the

covariance operator In probability theory, for a probability measure P on a Hilbert space ''H'' with inner product \langle \cdot,\cdot\rangle , the covariance of P is the bilinear form Cov: ''H'' × ''H'' → R given by :\mathrm(x, y) = ...

has an unbounded inverse in

H

. Nevertheless, if one assumes that Picard condition holds for

X

in the range space of the covariance operator, whitening becomes possible. A whitening operator can be then defined from the factorization of the

Moore–Penrose inverse In mathematics, and in particular linear algebra, the Moore–Penrose inverse of a matrix , often called the pseudoinverse, is the most widely known generalization of the inverse matrix. It was independently described by E. H. Moore in 1920, Ar ...

of the covariance operator, which has effective mapping on Karhunen–Loève type expansions of

X

. The advantage of these whitening transformations is that they can be optimized according to the underlying topological properties of the data, thus producing more robust whitening representations. High-dimensional features of the data can be exploited through kernel regressors or basis function systems.

R implementation

An implementation of several whitening procedures in R, including ZCA-whitening and PCA whitening but also CCA whitening, is available in the "whitening" R package published on CRAN. The R package "pfica" allows the computation of high-dimensional whitening representations using basis function systems (

B-splines In numerical analysis, a B-spline (short for basis spline) is a type of spline function designed to have minimal support (overlap) for a given degree, smoothness, and set of breakpoints (knots that partition its domain), making it a fundamental ...

Fourier basis In mathematics, a basis function is an element of a particular basis for a function space. Every function in the function space can be represented as a linear combination of basis functions, just as every vector in a vector space can be represen ...

, etc.).

References

{{reflist

External links

* http://courses.media.mit.edu/2010fall/mas622j/whiten.pdf
The ZCA whitening transformation
Appendix A of ''Learning Multiple Layers of Features from Tiny Images'' by A. Krizhevsky. Classification algorithms