In mathematics and statistics, random projection is a technique used to
reduce the dimensionality of a set of points which lie in
Euclidean space
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are ''Euclidean spaces ...
. According to theoretical results, random projection preserves distances well, but empirical results are sparse. They have been applied to many natural language tasks under the name
random indexing
Random indexing is a dimensionality reduction method and computational framework for distributional semantics, based on the insight that very-high-dimensional vector space model implementations are impractical, that models need not grow in dimensi ...
.
Dimensionality reduction
Dimensionality reduction, as the name suggests, is reducing the number of random variables using various mathematical methods from statistics and machine learning. Dimensionality reduction is often used to reduce the problem of managing and manipulating large data sets. Dimensionality reduction techniques generally use linear transformations in determining the intrinsic dimensionality of the manifold as well as extracting its principal directions. For this purpose there are various related techniques, including:
principal component analysis
Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing.
The data is linearly transformed onto a new coordinate system such that th ...
,
linear discriminant analysis
Linear discriminant analysis (LDA), normal discriminant analysis (NDA), canonical variates analysis (CVA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to fi ...
,
canonical correlation analysis
In statistics, canonical-correlation analysis (CCA), also called canonical variates analysis, is a way of inferring information from cross-covariance matrices. If we have two vectors ''X'' = (''X''1, ..., ''X'n'') and ''Y'' ...
,
discrete cosine transform
A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequency, frequencies. The DCT, first proposed by Nasir Ahmed (engineer), Nasir Ahmed in 1972, is a widely ...
, random projection, etc.
Random projection is a simple and computationally efficient way to reduce the dimensionality of data by trading a controlled amount of error for faster processing times and smaller model sizes. The dimensions and distribution of random projection matrices are controlled so as to approximately preserve the pairwise distances between any two samples of the dataset.
Method
The core idea behind random projection is given in the
Johnson-Lindenstrauss lemma, which states that if points in a vector space are of sufficiently high dimension, then they may be projected into a suitable lower-dimensional space in a way which approximately preserves pairwise distances between the points with high probability.
In random projection, the original
-dimensional data is projected to a
-dimensional subspace, by multiplying on the left by a random matrix
. Using matrix notation: If
is the original set of N d-dimensional observations, then
is the projection of the data onto a lower k-dimensional subspace. Random projection is computationally simple: form the random matrix "R" and project the
data matrix X onto K dimensions of order
. If the data matrix X is sparse with about c nonzero entries per column, then the complexity of this operation is of order
.
Orthogonal random projection
A unit vector can be orthogonally projected to a random subspace. Let
be the original unit vector, and let
be its projection. The norm-squared
has the same distribution as projecting a random point, uniformly sampled on the unit sphere, to its first
coordinates. This is equivalent to sampling a random point in the multivariate gaussian distribution
, then normalizing it.
Therefore,
has the same distribution as
, which by the
chi-squared construction of the
Beta distribution
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval , 1
The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
or (0, 1) in terms of two positive Statistical parameter, parameters, denoted by ''alpha'' (''α'') an ...
, has distribution
, with mean
.
We have a
concentration inequality
In chemistry, concentration is the abundance of a constituent divided by the total volume of a mixture. Several types of mathematical description can be distinguished: '' mass concentration'', ''molar concentration'', ''number concentration'', an ...
for any
.
Gaussian random projection
The random matrix R can be generated using a Gaussian distribution. The first row is a random unit vector uniformly chosen from
. The second row is a random unit vector from the space orthogonal to the first row, the third row is a random unit vector from the space orthogonal to the first two rows, and so on. In this way of choosing R, and the following properties are satisfied:
* Spherical symmetry: For any orthogonal matrix
, RA and R have the same distribution.
* Orthogonality: The rows of R are orthogonal to each other.
* Normality: The rows of R are unit-length vectors.
More computationally efficient random projections
Achlioptas has shown that the random matrix can be sampled more efficiently. Either the full matrix can be sampled IID according to
:
or the full matrix can be sampled IID according to
Both are efficient for database applications because the computations can be performed using integer arithmetic. More related study is conducted in.
It was later shown how to use integer arithmetic while making the distribution even sparser, having very few nonzeroes per column, in work on the Sparse JL Transform. This is advantageous since a sparse embedding matrix means being able to project the data to lower dimension even faster.
Random Projection with Quantization
Random projection can be further condensed by quantization (discretization), with 1-bit (sign random projection) or multi-bits. It is the building block of SimHash, RP tree, and other memory efficient estimation and learning methods.
Large quasiorthogonal bases
The
Johnson-Lindenstrauss lemma states that large sets of vectors in a high-dimensional space can be linearly mapped in a space of much lower (but still high) dimension ''n'' with approximate preservation of distances. One of the explanations of this effect is the exponentially high quasiorthogonal dimension of ''n''-dimensional
Euclidean space
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are ''Euclidean spaces ...
. There are exponentially large (in dimension ''n'') sets of almost
orthogonal
In mathematics, orthogonality (mathematics), orthogonality is the generalization of the geometric notion of ''perpendicularity''. Although many authors use the two terms ''perpendicular'' and ''orthogonal'' interchangeably, the term ''perpendic ...
vectors (with small value of
inner products
In mathematics, an inner product space (or, rarely, a Hausdorff pre-Hilbert space) is a real vector space or a complex vector space with an operation called an inner product. The inner product of two vectors in the space is a scalar, often ...
) in ''n''–dimensional Euclidean space. This observation is useful in
indexing of high-dimensional data.
Quasiorthogonality of large random sets is important for methods of random approximation in
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
. In high dimensions, exponentially large numbers of randomly and independently chosen vectors from equidistribution on a sphere (and from many other distributions) are almost orthogonal with probability close to one.
This implies that in order to represent an element of such a high-dimensional space by linear combinations of randomly and independently chosen vectors, it may often be necessary to generate samples of exponentially large length if we use bounded coefficients in linear combinations. On the other hand, if coefficients with arbitrarily large values are allowed, the number of randomly generated elements that are sufficient for approximation is even less than dimension of the data space.
Implementations
RandPro- An R package for random projection
- A module for random projection from the
scikit-learn
scikit-learn (formerly scikits.learn and also known as sklearn) is a free and open-source machine learning library for the Python programming language.
It features various classification, regression and clustering algorithms including support ...
Python library
* Weka implementatio
See also
*
Locality-sensitive hashing
In computer science, locality-sensitive hashing (LSH) is a fuzzy hashing technique that hashes similar input items into the same "buckets" with high probability. (The number of buckets is much smaller than the universe of possible input items.) Si ...
*
Random mapping
*
Johnson-Lindenstrauss lemma
References
Further reading
*
*
* {{cite report , last1=Ramdas , first1=Aditya , citeseerx=10.1.1.377.2593 , title=A Random Introduction To Random Projections
Dimension reduction