HOME

TheInfoList



OR:

Maximum Variance Unfolding (MVU), also known as Semidefinite Embedding (SDE), is an
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
in
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
that uses
semidefinite programming Semidefinite programming (SDP) is a subfield of convex optimization concerned with the optimization of a linear objective function (a user-specified function that the user wants to minimize or maximize) over the intersection of the cone of positive ...
to perform
non-linear dimensionality reduction Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-d ...
of high-dimensional
vector Vector most often refers to: *Euclidean vector, a quantity with a magnitude and a direction *Vector (epidemiology), an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematic ...
ial input data. It is motivated by the observation that
kernel Principal Component Analysis In the field of multivariate statistics, kernel principal component analysis (kernel PCA) is an extension of principal component analysis (PCA) using techniques of kernel methods. Using a kernel, the originally linear operations of PCA are perfor ...
(kPCA) does not reduce the data dimensionality, as it leverages the
Kernel trick In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for exampl ...
to non-linearly map the original data into an
inner-product space In mathematics, an inner product space (or, rarely, a Hausdorff pre-Hilbert space) is a real vector space or a complex vector space with an operation called an inner product. The inner product of two vectors in the space is a scalar, often ...
.


Algorithm

MVU creates a mapping from the high dimensional input vectors to some low dimensional Euclidean vector space in the following steps: # A
neighbourhood A neighbourhood (British English, Irish English, Australian English and Canadian English) or neighborhood (American English; see spelling differences) is a geographically localised community within a larger city, town, suburb or rural a ...
graph is created. Each input is connected with its k-nearest input vectors (according to Euclidean distance metric) and all k-nearest neighbors are connected with each other. If the data is sampled well enough, the resulting graph is a discrete approximation of the underlying manifold. # The neighbourhood graph is "unfolded" with the help of semidefinite programming. Instead of learning the output vectors directly, the semidefinite programming aims to find an inner product matrix that maximizes the pairwise distances between any two inputs that are not connected in the neighbourhood graph while preserving the nearest neighbors distances. # The low-dimensional embedding is finally obtained by application of
multidimensional scaling Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate "information about the pairwise 'distances' among a set of n objects or individuals" into a configurati ...
on the learned inner product matrix. The steps of applying semidefinite programming followed by a linear dimensionality reduction step to recover a low-dimensional embedding into a Euclidean space were first proposed by Linial, London, and Rabinovich.


Optimization formulation

Let X \,\! be the original input and Y\,\! be the embedding. If i,j\,\! are two neighbors, then the local isometry constraint that needs to be satisfied is: :, X_-X_, ^=, Y_-Y_, ^\,\! Let G, K\,\! be the Gram matrices of X \,\! and Y \,\! (i.e.: G_=X_i \cdot X_j,K_=Y_i \cdot Y_j \,\!). We can express the above constraint for every neighbor points i,j\,\! in term of G, K\,\!: :G_+G_-G_-G_=K_+K_-K_-K_\,\! In addition, we also want to constrain the embedding Y \,\! to center at the origin: 0 = , \sum_Y_, ^2\Leftrightarrow(\sum_Y_) \cdot (\sum_Y_)\Leftrightarrow\sum_Y_ \cdot Y_\Leftrightarrow\sum_K_ As described above, except the distances of neighbor points are preserved, the algorithm aims to maximize the pairwise distance of every pair of points. The objective function to be maximized is: T(Y)=\dfrac\sum_, Y_-Y_, ^ Intuitively, maximizing the function above is equivalent to pulling the points as far away from each other as possible and therefore "unfold" the manifold. The local isometry constraint Let \tau = max \ \,\! where \eta_ := \begin 1 & \mbox\ i \mbox j \\ 0 & \mbox . \end prevents the objective function from diverging (going to infinity). Since the graph has N points, the distance between any two points , Y_-Y_, ^2 \leq N \tau \,\!. We can then bound the objective function as follows: :T(Y)=\dfrac\sum_, Y_-Y_, ^ \leq \dfrac\sum_(N\tau)^2 = \dfrac \,\! The objective function can be rewritten purely in the form of the Gram matrix: : \begin T(Y) &= \dfrac\sum_, Y_-Y_, ^ \\ &= \dfrac\sum_(Y_^2+Y_^2-Y_ \cdot Y_ - Y_ \cdot Y_)\\ &= \dfrac(\sum_Y_^2+\sum_Y_^2-\sum_Y_ \cdot Y_ -\sum_Y_ \cdot Y_)\\ &= \dfrac(\sum_Y_^2+\sum_Y_^2-0 -0)\\ &= \dfrac(\sum_Y_^2)=\dfrac(Tr(K))\\ \end \,\! Finally, the optimization can be formulated as: \begin & \text && Tr(\mathbf)\\ & \text && \mathbf \succeq 0, \sum_\mathbf_ = 0 \\ & \text && G_+G_-G_-G_=K_+K_-K_-K_, \forall i, j \mbox \eta_ = 1, \end After the Gram matrix K \,\! is learned by semidefinite programming, the output Y \,\! can be obtained via Cholesky decomposition. In particular, the Gram matrix can be written as K_=\sum_^(\lambda_ V_ V_) \,\! where V_ \,\! is the i-th element of eigenvector V_ \,\! of the eigenvalue \lambda_ \,\!. It follows that the \alpha \,\!-th element of the output Y_i \,\! is \sqrt V_ \,\!.


See also

* Locally linear embedding * Isometry (mathematics) *
Local Tangent Space Alignment Local tangent space alignment (LTSA) is a method for manifold learning, which can efficiently learn a nonlinear embedding into low-dimensional coordinates from high-dimensional data, and can also reconstruct high-dimensional coordinates from emb ...
*
Riemannian manifold In differential geometry, a Riemannian manifold or Riemannian space , so called after the German mathematician Bernhard Riemann, is a real, smooth manifold ''M'' equipped with a positive-definite inner product ''g'p'' on the tangent space ...
* Energy minimization


Notes


References

* * * * * {{cite journal, last=Lawrence, first=Neil D, title=A unifying probabilistic perspective for spectral dimensionality reduction: insights and new models, year=2012, pages=1612, volume=13, issue=May, journal=
Journal of Machine Learning Research The ''Journal of Machine Learning Research'' is a peer-reviewed open access scientific journal covering machine learning. It was established in 2000 and the first editor-in-chief was Leslie Kaelbling. The current editors-in-chief are Francis Bac ...
, url=http://www.jmlr.org/papers/v13/lawrence12a.html, bibcode=2010arXiv1010.4830L, arxiv=1010.4830


Additional material


Kilian Q. Weinberger's MVU Matlab code
Computational statistics Dimension reduction Optimization algorithms and methods