A whitening transformation or sphering transformation is a
linear transformation
In mathematics, and more specifically in linear algebra, a linear map (also called a linear mapping, linear transformation, vector space homomorphism, or in some contexts linear function) is a mapping V \to W between two vector spaces that pr ...
that transforms a vector of
random variables
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' in its mathematical definition refers ...
with a known
covariance matrix
In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...
into a set of new variables whose covariance is the
identity matrix
In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. It has unique properties, for example when the identity matrix represents a geometric transformation, the obje ...
, meaning that they are
uncorrelated
In probability theory and statistics, two real-valued random variables, X, Y, are said to be uncorrelated if their covariance, \operatorname ,Y= \operatorname Y- \operatorname \operatorname /math>, is zero. If two variables are uncorrelated, ther ...
and each have
variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
1. The transformation is called "whitening" because it changes the input vector into a
white noise
In signal processing, white noise is a random signal having equal intensity at different frequencies, giving it a constant power spectral density. The term is used with this or similar meanings in many scientific and technical disciplines, i ...
vector.
Several other transformations are closely related to whitening:
# the decorrelation transform removes only the correlations but leaves variances intact,
# the standardization transform sets variances to 1 but leaves correlations intact,
# a coloring transformation transforms a vector of white random variables into a random vector with a specified covariance matrix.
Definition
Suppose
is a
random (column) vector with non-singular covariance matrix
and mean
. Then the transformation
with
a whitening matrix
satisfying the condition
yields the whitened random vector
with unit diagonal covariance.
If
has non-zero mean
, then whitening can be performed by
.
There are infinitely many possible whitening matrices
that all satisfy the above condition. Commonly used choices are
(Mahalanobis or ZCA whitening),
where
is the
Cholesky decomposition
In linear algebra, the Cholesky decomposition or Cholesky factorization (pronounced ) is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for eff ...
of
(Cholesky whitening),
or the eigen-system of
(PCA whitening).
Optimal whitening transforms can be singled out by investigating the cross-covariance and cross-correlation of
and
.
For example, the unique optimal whitening transformation achieving maximal component-wise correlation between original
and whitened
is produced by the whitening matrix
where
is the correlation matrix and
the diagonal variance matrix.
Whitening a data matrix
Whitening a data matrix follows the same transformation as for random variables. An empirical whitening transform is obtained by
estimating the covariance (e.g. by
maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
) and subsequently constructing a corresponding estimated whitening matrix (e.g. by
Cholesky decomposition
In linear algebra, the Cholesky decomposition or Cholesky factorization (pronounced ) is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for eff ...
).
High-dimensional whitening
This modality is a generalization of the pre-whitening procedure extended to more general spaces where
is usually assumed to be a random function or other random objects in a
Hilbert space
In mathematics, a Hilbert space is a real number, real or complex number, complex inner product space that is also a complete metric space with respect to the metric induced by the inner product. It generalizes the notion of Euclidean space. The ...
. One of the main issues of extending whitening to infinite dimensions is that the
covariance operator
In probability theory, for a probability measure P on a Hilbert space ''H'' with inner product \langle \cdot,\cdot\rangle , the covariance of P is the bilinear form Cov: ''H'' × ''H'' → R given by
:\mathrm(x, y) = ...
has an unbounded inverse in
. Nevertheless, if one assumes that Picard condition holds for
in the range space of the covariance operator, whitening becomes possible. A whitening operator can be then defined from the factorization of the
Moore–Penrose inverse
In mathematics, and in particular linear algebra, the Moore–Penrose inverse of a matrix , often called the pseudoinverse, is the most widely known generalization of the inverse matrix. It was independently described by E. H. Moore in 1920, Ar ...
of the covariance operator, which has effective mapping on Karhunen–Loève type expansions of
. The advantage of these whitening transformations is that they can be optimized according to the underlying topological properties of the data, thus producing more robust whitening representations. High-dimensional features of the data can be exploited through kernel regressors or basis function systems.
R implementation
An implementation of several whitening procedures in
R, including ZCA-whitening and PCA whitening but also
CCA whitening, is available in the "whitening" R package published on
CRAN. The R package "pfica"
allows the computation of high-dimensional whitening representations using basis function systems (
B-splines
In numerical analysis, a B-spline (short for basis spline) is a type of spline function designed to have minimal support (overlap) for a given degree, smoothness, and set of breakpoints (knots that partition its domain), making it a fundamental ...
,
Fourier basis
In mathematics, a basis function is an element of a particular basis for a function space. Every function in the function space can be represented as a linear combination of basis functions, just as every vector in a vector space can be represen ...
, etc.).
See also
*
Decorrelation
Decorrelation is a general term for any process that is used to reduce autocorrelation within a signal, or cross-correlation within a set of signals, while preserving other aspects of the signal. A frequently used method of decorrelation is the us ...
*
Principal component analysis
Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing.
The data is linearly transformed onto a new coordinate system such that th ...
*
Weighted least squares
Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the unequal variance of observations (''heteroscedasticity'') is incorporated into ...
*
Canonical correlation
In statistics, canonical-correlation analysis (CCA), also called canonical variates analysis, is a way of inferring information from cross-covariance matrices. If we have two vectors ''X'' = (''X''1, ..., ''X'n'') and ''Y'' ...
*
Mahalanobis distance
The Mahalanobis distance is a distance measure, measure of the distance between a point P and a probability distribution D, introduced by Prasanta Chandra Mahalanobis, P. C. Mahalanobis in 1936. The mathematical details of Mahalanobis distance ...
(is Euclidean after W. transformation).
References
{{reflist
External links
* http://courses.media.mit.edu/2010fall/mas622j/whiten.pdf
The ZCA whitening transformation Appendix A of ''Learning Multiple Layers of Features from Tiny Images'' by A. Krizhevsky.
Classification algorithms