The manifold hypothesis posits that many
high-dimensional data sets that occur in the real world actually lie along low-dimensional
latent manifolds inside that high-dimensional space. As a consequence of the manifold hypothesis, many data sets that appear to initially require many variables to describe, can actually be described by a comparatively small number of variables, likened to the local
coordinate system
In geometry, a coordinate system is a system that uses one or more numbers, or coordinates, to uniquely determine and standardize the position of the points or other geometric elements on a manifold such as Euclidean space. The coordinates are ...
of the underlying manifold. It is suggested that this principle underpins the effectiveness of machine learning algorithms in describing high-dimensional data sets by considering a few common features.
The manifold hypothesis is related to the effectiveness of
nonlinear dimensionality reduction
Nonlinear dimensionality reduction, also known as manifold learning, is any of various related techniques that aim to project high-dimensional data, potentially existing across non-linear manifolds which cannot be adequately captured by linear de ...
techniques in machine learning. Many techniques of dimensional reduction make the assumption that data lies along a low-dimensional submanifold, such as
manifold sculpting,
manifold alignment, and
manifold regularization.
The major implications of this hypothesis is that
*
Machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
models only have to fit relatively simple, low-dimensional, highly structured subspaces within their potential input space (latent manifolds).
* Within one of these manifolds, it’s always possible to
interpolate
In the mathematical field of numerical analysis, interpolation is a type of estimation, a method of constructing (finding) new data points based on the range of a discrete set of known data points.
In engineering and science, one often has a ...
between two inputs, that is to say, morph one into another via a continuous path along which all points fall on the manifold.
The ability to interpolate between samples is the key to generalization in
deep learning
Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
.
The information geometry of statistical manifolds
An empirically-motivated approach to the manifold hypothesis focuses on its correspondence with an effective theory for manifold learning under the assumption that robust machine learning requires encoding the dataset of interest using methods for data compression. This perspective gradually emerged using the tools of information geometry thanks to the coordinated effort of scientists working on the
efficient coding hypothesis,
predictive coding and
variational Bayesian methods.
The argument for reasoning about the information geometry on the latent space of distributions rests upon the existence and uniqueness of the
Fisher information metric
In information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, ''i.e.'', a smooth manifold whose points are probability distributions. It can be used to calculate the ...
. In this general setting, we are trying to find a stochastic embedding of a statistical manifold. From the perspective of dynamical systems, in the
big data
Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data processing, data-processing application software, software. Data with many entries (rows) offer greater statistical power, while data with ...
regime this manifold generally exhibits certain properties such as homeostasis:
# We can sample large amounts of data from the underlying generative process.
# Machine Learning experiments are reproducible, so the statistics of the generating process exhibit stationarity.
In a sense made precise by theoretical neuroscientists working on the
free energy principle, the statistical manifold in question possesses a
Markov blanket
In statistics and machine learning, a Markov blanket of a random variable is a minimal set of variables that renders the variable conditionally independent of all other variables in the system. This concept is central in probabilistic graphical ...
.
See also
*
Kolmogorov complexity
In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in a predetermined programming language) that prod ...
*
Minimum description length
*
Solomonoff's theory of inductive inference
References
Further reading
*
*
{{refend
Machine learning
Theoretical computer science