Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its
intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often
sparse as a consequence of the
curse of dimensionality, and analyzing the data is usually
computationally intractable. Dimensionality reduction is common in fields that deal with large numbers of observations and/or large numbers of variables, such as
signal processing
Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as audio signal processing, sound, image processing, images, Scalar potential, potential fields, Seismic tomograph ...
,
speech recognition
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also ...
,
neuroinformatics
Neuroinformatics is the emergent field that combines informatics and neuroscience. Neuroinformatics is related with neuroscience data and information processing by artificial neural networks. There are three main directions where neuroinformatics ...
, and
bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
.
Methods are commonly divided into linear and nonlinear approaches.
Linear approaches can be further divided into
feature selection and
feature extraction
Feature may refer to:
Computing
* Feature recognition, could be a hole, pocket, or notch
* Feature (computer vision), could be an edge, corner or blob
* Feature (machine learning), in statistics: individual measurable properties of the phenome ...
. Dimensionality reduction can be used for
noise reduction
Noise reduction is the process of removing noise from a signal. Noise reduction techniques exist for audio and images. Noise reduction algorithms may distort the signal to some degree. Noise rejection is the ability of a circuit to isolate an u ...
,
data visualization,
cluster analysis
Cluster analysis or clustering is the data analyzing technique in which task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more Similarity measure, similar (in some specific sense defined by the ...
, or as an intermediate step to facilitate other analyses.
Feature selection
The process of
feature selection aims to find a suitable subset of the input variables (''features'', or ''attributes'') for the task at hand. The three strategies are: the ''filter'' strategy (e.g.,
information gain), the ''wrapper'' strategy (e.g., accuracy-guided search), and the ''embedded'' strategy (features are added or removed while building the model based on prediction errors).
Data analysis
Data analysis is the process of inspecting, Data cleansing, cleansing, Data transformation, transforming, and Data modeling, modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Da ...
such as
regression or
classification
Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identif ...
can be done in the reduced space more accurately than in the original space.
Feature projection
Feature projection (also called feature extraction) transforms the data from the
high-dimensional space to a space of fewer dimensions. The data transformation may be linear, as in
principal component analysis
Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing.
The data is linearly transformed onto a new coordinate system such that th ...
(PCA), but many
nonlinear dimensionality reduction techniques also exist. For multidimensional data,
tensor representation can be used in dimensionality reduction through
multilinear subspace learning
Multilinear subspace learning is an approach for disentangling the causal factor of data formation and performing dimensionality reduction.M. A. O. Vasilescu, D. Terzopoulos (2003"Multilinear Subspace Analysis of Image Ensembles" "Proceedings of ...
.
Principal component analysis (PCA)
The main linear technique for dimensionality reduction, principal component analysis, performs a linear mapping of the data to a lower-dimensional space in such a way that the variance of the data in the low-dimensional representation is maximized. In practice, the
covariance
In probability theory and statistics, covariance is a measure of the joint variability of two random variables.
The sign of the covariance, therefore, shows the tendency in the linear relationship between the variables. If greater values of one ...
(and sometimes the
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
)
matrix
Matrix (: matrices or matrixes) or MATRIX may refer to:
Science and mathematics
* Matrix (mathematics), a rectangular array of numbers, symbols or expressions
* Matrix (logic), part of a formula in prenex normal form
* Matrix (biology), the m ...
of the data is constructed and the
eigenvectors
In linear algebra, an eigenvector ( ) or characteristic vector is a Vector (mathematics and physics), vector that has its direction (geometry), direction unchanged (or reversed) by a given linear map, linear transformation. More precisely, an e ...
on this matrix are computed. The eigenvectors that correspond to the largest eigenvalues (the principal components) can now be used to reconstruct a large fraction of the variance of the original data. Moreover, the first few eigenvectors can often be interpreted in terms of the large-scale physical behavior of the system, because they often contribute the vast majority of the system's energy, especially in low-dimensional systems. Still, this must be proved on a case-by-case basis as not all systems exhibit this behavior. The original space (with dimension of the number of points) has been reduced (with data loss, but hopefully retaining the most important variance) to the space spanned by a few eigenvectors.
Non-negative matrix factorization (NMF)
NMF decomposes a non-negative matrix to the product of two non-negative ones, which has been a promising tool in fields where only non-negative signals exist,
such as astronomy.
NMF is well known since the multiplicative update rule by Lee & Seung,
which has been continuously developed: the inclusion of uncertainties,
the consideration of missing data and parallel computation,
sequential construction
which leads to the stability and linearity of NMF,
as well as other
updates including handling missing data in
digital image processing.
With a stable component basis during construction, and a linear modeling process,
sequential NMF is able to preserve the flux in direct imaging of circumstellar structures in astronomy,
as one of the
methods of detecting exoplanets
Methods of detecting exoplanets usually rely on indirect strategies – that is, they do not directly Astrophotography, image the planet but deduce its existence from another signal. Any planet is an extremely faint light source compared to its ...
, especially for the direct imaging of
circumstellar discs. In comparison with PCA, NMF does not remove the mean of the matrices, which leads to physical non-negative fluxes; therefore NMF is able to preserve more information than PCA as demonstrated by Ren et al.
Kernel PCA
Principal component analysis can be employed in a nonlinear way by means of the
kernel trick. The resulting technique is capable of constructing nonlinear mappings that maximize the variance in the data. The resulting technique is called
kernel PCA.
Graph-based kernel PCA
Other prominent nonlinear techniques include
manifold learning techniques such as
Isomap,
locally linear embedding (LLE), Hessian LLE, Laplacian eigenmaps, and methods based on tangent space analysis. These techniques construct a low-dimensional data representation using a cost function that retains local properties of the data, and can be viewed as defining a graph-based kernel for Kernel PCA.
More recently, techniques have been proposed that, instead of defining a fixed kernel, try to learn the kernel using
semidefinite programming. The most prominent example of such a technique is
maximum variance unfolding (MVU). The central idea of MVU is to exactly preserve all pairwise distances between nearest neighbors (in the inner product space) while maximizing the distances between points that are not nearest neighbors.
An alternative approach to neighborhood preservation is through the minimization of a cost function that measures differences between distances in the input and output spaces. Important examples of such techniques include: classical
multidimensional scaling
Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a data set. MDS is used to translate distances between each pair of n objects in a set into a configuration of n points mapped into an ...
, which is identical to PCA;
Isomap, which uses geodesic distances in the data space;
diffusion maps, which use diffusion distances in the data space;
t-distributed stochastic neighbor embedding (t-SNE), which minimizes the divergence between distributions over pairs of points; and curvilinear component analysis.
A different approach to nonlinear dimensionality reduction is through the use of
autoencoders, a special kind of
feedforward neural network
Feedforward refers to recognition-inference architecture of neural networks. Artificial neural network architectures are based on inputs multiplied by weights to obtain outputs (inputs-to-output): feedforward. Recurrent neural networks, or neur ...
s with a bottleneck hidden layer. The training of deep encoders is typically performed using a greedy layer-wise pre-training (e.g., using a stack of
restricted Boltzmann machines) that is followed by a finetuning stage based on
backpropagation.
Linear discriminant analysis (LDA)
Linear discriminant analysis (LDA) is a generalization of Fisher's linear discriminant, a method used in statistics, pattern recognition, and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events.
Generalized discriminant analysis (GDA)
GDA deals with nonlinear discriminant analysis using kernel function operator. The underlying theory is close to the
support-vector machine
In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning, supervised Maximum-margin hyperplane, max-margin models with associated learning algorithms that analyze data for Statistical classification ...
s (SVM) insofar as the GDA method provides a mapping of the input vectors into high-dimensional feature space.
Similar to LDA, the objective of GDA is to find a projection for the features into a lower dimensional space by maximizing the ratio of between-class scatter to within-class scatter.
Autoencoder
Autoencoders can be used to learn nonlinear dimension reduction functions and codings together with an inverse function from the coding to the original representation.
t-SNE
T-distributed Stochastic Neighbor Embedding (t-SNE) is a nonlinear dimensionality reduction technique useful for the visualization of high-dimensional datasets. It is not recommended for use in analysis such as clustering or outlier detection since it does not necessarily preserve densities or distances well.
UMAP
Uniform manifold approximation and projection (UMAP) is a nonlinear dimensionality reduction technique. Visually, it is similar to t-SNE, but it assumes that the data is uniformly distributed on a
locally connected Riemannian manifold
In differential geometry, a Riemannian manifold is a geometric space on which many geometric notions such as distance, angles, length, volume, and curvature are defined. Euclidean space, the N-sphere, n-sphere, hyperbolic space, and smooth surf ...
and that the
Riemannian metric
In differential geometry, a Riemannian manifold is a geometric space on which many geometric notions such as distance, angles, length, volume, and curvature are defined. Euclidean space, the N-sphere, n-sphere, hyperbolic space, and smooth surf ...
is locally constant or approximately locally constant.
Dimension reduction
For high-dimensional datasets, dimension reduction is usually performed prior to applying a
''k''-nearest neighbors (''k''-NN) algorithm in order to mitigate the
curse of dimensionality.
Feature extraction
Feature may refer to:
Computing
* Feature recognition, could be a hole, pocket, or notch
* Feature (computer vision), could be an edge, corner or blob
* Feature (machine learning), in statistics: individual measurable properties of the phenome ...
and dimension reduction can be combined in one step, using
principal component analysis
Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing.
The data is linearly transformed onto a new coordinate system such that th ...
(PCA),
linear discriminant analysis
Linear discriminant analysis (LDA), normal discriminant analysis (NDA), canonical variates analysis (CVA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to fi ...
(LDA),
canonical correlation analysis
In statistics, canonical-correlation analysis (CCA), also called canonical variates analysis, is a way of inferring information from cross-covariance matrices. If we have two vectors ''X'' = (''X''1, ..., ''X'n'') and ''Y'' ...
(CCA), or
non-negative matrix factorization
Non-negative matrix factorization (NMF or NNMF), also non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix is factorized into (usually) two matrices and , with the property th ...
(NMF) techniques to pre-process the data, followed by clustering via ''k''-NN on
feature vectors in a reduced-dimension space. In
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
, this process is also called low-dimensional
embedding
In mathematics, an embedding (or imbedding) is one instance of some mathematical structure contained within another instance, such as a group (mathematics), group that is a subgroup.
When some object X is said to be embedded in another object Y ...
.
For high-dimensional datasets (e.g., when performing similarity search on live video streams, DNA data, or high-dimensional
time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
), running a fast approximate ''k''-NN search using
locality-sensitive hashing,
random projection, "sketches", or other high-dimensional similarity search techniques from the
VLDB conference toolbox may be the only feasible option.
Applications
A dimensionality reduction technique that is sometimes used in
neuroscience
Neuroscience is the scientific study of the nervous system (the brain, spinal cord, and peripheral nervous system), its functions, and its disorders. It is a multidisciplinary science that combines physiology, anatomy, molecular biology, ...
is
maximally informative dimensions,
which finds a lower-dimensional representation of a dataset such that as much
information
Information is an Abstraction, abstract concept that refers to something which has the power Communication, to inform. At the most fundamental level, it pertains to the Interpretation (philosophy), interpretation (perhaps Interpretation (log ...
as possible about the original data is preserved.
See also
*
CUR matrix approximation
*
Data transformation (statistics)
*
Hyperparameter optimization
*
Information gain in decision trees
*
Johnson–Lindenstrauss lemma
*
Latent semantic analysis
Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the d ...
*
Local tangent space alignment
*
Locality-sensitive hashing
*
MinHash
*
Multifactor dimensionality reduction
*
Nearest neighbor search
Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically expressed in terms of a dissimilarity function: ...
*
Nonlinear dimensionality reduction
*
Random projection
*
Sammon mapping
*
Semantic mapping (statistics)
*
Semidefinite embedding
*
Singular value decomposition
In linear algebra, the singular value decomposition (SVD) is a Matrix decomposition, factorization of a real number, real or complex number, complex matrix (mathematics), matrix into a rotation, followed by a rescaling followed by another rota ...
*
Sufficient dimension reduction
*
Topological data analysis
*
Weighted correlation network analysis
*
Factor analysis
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observe ...
Notes
References
*
*
*
*
External links
JMLR Special Issue on Variable and Feature SelectionELastic MAPs
Locally Linear Embedding*
ttps://web.archive.org/web/20040411051530/http://isomap.stanford.edu/ A Global Geometric Framework for Nonlinear Dimensionality Reduction
{{DEFAULTSORT:Dimension Reduction