HOME

TheInfoList



OR:

In mathematics, the Johnson–Lindenstrauss lemma is a result named after William B. Johnson and
Joram Lindenstrauss Joram Lindenstrauss ( he, יורם לינדנשטראוס) (October 28, 1936 – April 29, 2012) was an Israeli mathematician working in functional analysis. He was a professor of mathematics at the Einstein Institute of Mathematics. Biogra ...
concerning low-distortion
embedding In mathematics, an embedding (or imbedding) is one instance of some mathematical structure contained within another instance, such as a group that is a subgroup. When some object X is said to be embedded in another object Y, the embedding is g ...
s of points from high-dimensional into low-dimensional
Euclidean space Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidea ...
. The lemma states that a set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved. The map used for the embedding is at least
Lipschitz Lipschitz, Lipshitz, or Lipchitz, is an Ashkenazi Jewish (Yiddish/German-Jewish) surname. The surname has many variants, including: Lifshitz ( Lifschitz), Lifshits, Lifshuts, Lefschetz; Lipschitz, Lipshitz, Lipshits, Lopshits, Lipschutz (Lip ...
, and can even be taken to be an
orthogonal projection In linear algebra and functional analysis, a projection is a linear transformation P from a vector space to itself (an endomorphism) such that P\circ P=P. That is, whenever P is applied twice to any vector, it gives the same result as if it wer ...
. The lemma has applications in
compressed sensing Compressed sensing (also known as compressive sensing, compressive sampling, or sparse sampling) is a signal processing technique for efficiently acquiring and reconstructing a signal, by finding solutions to underdetermined linear systems. This ...
,
manifold learning Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low- ...
,
dimensionality reduction Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally ...
, and
graph embedding In topological graph theory, an embedding (also spelled imbedding) of a graph G on a surface \Sigma is a representation of G on \Sigma in which points of \Sigma are associated with vertices and simple arcs (homeomorphic images of ,1/math> ...
. Much of the data stored and manipulated on computers, including text and images, can be represented as points in a high-dimensional space (see
vector space model Vector space model or term vector model is an algebraic model for representing text documents (and any objects, in general) as vectors of identifiers (such as index terms). It is used in information filtering, information retrieval, indexing an ...
for the case of text). However, the essential algorithms for working with such data tend to become bogged down very quickly as dimension increases. It is therefore desirable to reduce the dimensionality of the data in a way that preserves its relevant structure. The Johnson–Lindenstrauss lemma is a classic result in this vein. Also, the lemma is tight up to a constant factor, i.e. there exists a set of points of size ''m'' that needs dimension : \Omega \left(\frac\right) in order to preserve the distances between all pairs of points within a factor of (1 \pm \varepsilon).


Lemma

Given 0 < \varepsilon < 1, a set X of m\in\mathbb Z_ points in \mathbb^N (N\in\mathbb Z_), and an integer n > 8(\ln m)/\varepsilon^2, there is a linear map f: \mathbb^N \rightarrow \mathbb^n such that : (1-\varepsilon)\, u-v\, ^2 \leq \, f(u) - f(v)\, ^2 \leq (1+\varepsilon)\, u-v\, ^2 for all u,v \in X. The formula can be rearranged:(1+\varepsilon)^\, f(u)-f(v)\, ^2 \leq \, u-v\, ^2 \leq (1-\varepsilon)^\, f(u)-f(v)\, ^2 Alternatively, for any \epsilon\in(0,1) and any integer n\ge15(\ln m)/\varepsilon^2Or any integer n>128(\ln m)/(9\varepsilon^2). there exists a linear function f: \mathbb^N \rightarrow \mathbb^n such that the restriction f, _X is (1+\varepsilon)- bi-Lipschitz.This result follows from the above result. Sketch of proof: Note 1/(1+\varepsilon)<\sqrt and \sqrt<\sqrt<1+\varepsilon for all \varepsilon\in(0,1). Do casework for 1=''m'' and 1<''m'', applying the above result to 3\varepsilon/4 in the latter case, noting 128/9<15. One proof of the lemma takes ''ƒ'' to be a suitable multiple of the orthogonal projection onto a random subspace of dimension n in \mathbb^N, and exploits the phenomenon of
concentration of measure In mathematics, concentration of measure (about a median) is a principle that is applied in measure theory, probability and combinatorics, and has consequences for other fields such as Banach space theory. Informally, it states that "A random v ...
. An orthogonal projection will, in general, reduce the average distance between points, but the lemma can be viewed as dealing with ''relative distances'', which do not change under scaling. In a nutshell, you roll the dice and obtain a random projection, which will reduce the average distance, and then you scale up the distances so that the average distance returns to its previous value. If you keep rolling the dice, you will, in polynomial random time, find a projection for which the (scaled) distances satisfy the lemma.


Alternate statement

A related lemma is the distributional JL lemma. This lemma states that for any 0 < \varepsilon, \delta < 1/2 and positive integer d , there exists a distribution over \mathbb^ from which the matrix A is drawn such that for k = O(\varepsilon^ \log(1/\delta)) and for any unit-length vector x \in \mathbb^ , the claim below holds. : P(, \Vert Ax\Vert_2^2-1, >\varepsilon)<\delta One can obtain the JL lemma from the distributional version by setting x = (u-v)/\, u-v\, _2 and \delta < 1/n^2 for some pair ''u'',''v'' both in ''X''. Then the JL lemma follows by a union bound over all such pairs.


Speeding up the JL transform

Given ''A'', computing the matrix vector product takes ''O''(''kd'') time. There has been some work in deriving distributions for which the matrix vector product can be computed in less than ''O''(''kd'') time. There are two major lines of work. The first, ''Fast Johnson Lindenstrauss Transform'' (FJLT), was introduced by Ailon and Chazelle in 2006. This method allows the computation of the matrix vector product in just d\log d + k^ for any constant \gamma>0. Another approach is to build a distribution supported over matrices that are sparse. This method allows keeping only an \varepsilon fraction of the entries in the matrix, which means the computation can be done in just kd\varepsilon time. Furthermore, if the vector has only b non-zero entries, the Sparse JL takes time kb\varepsilon, which may be much less than the d\log d time used by Fast JL.


Tensorized random projections

It is possible to combine two JL matrices by taking the so-called face-splitting product, which is defined as the tensor products of the rows (was proposed by V. SlyusarAnna Esteve, Eva Boj & Josep Fortiana (2009): Interaction Terms in Distance-Based Regression, Communications in Statistics - Theory and Methods, 38:19, P. 350

/ref> in 1996 for
radar Radar is a detection system that uses radio waves to determine the distance (''ranging''), angle, and radial velocity of objects relative to the site. It can be used to detect aircraft, Marine radar, ships, spacecraft, guided missiles, motor v ...
and digital antenna array applications). More directly, let \in\mathbb R^ and \in\mathbb R^ be two matrices. Then the face-splitting product \bullet is : \bullet = \left \begin _1 \otimes _1\\\hline _2 \otimes _2\\\hline _3 \otimes _3\\ \end \right This idea of tensorization was used by Kasiviswanathan et al. 2010Kasiviswanathan, Shiva Prasad, et al. "The price of privately releasing contingency tables and the spectra of random matrices with correlated rows." Proceedings of the forty-second ACM symposium on Theory of computing. 2010. for
differential privacy Differential privacy (DP) is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. The idea behind differential privacy is t ...
. JL matrices defined like this use fewer random bits, and can be applied quickly to vectors that have tensor structure, due to the following identity: :(\mathbf \bull \mathbf)(x\otimes y) = \mathbfx \circ \mathbf y = \left \begin (\mathbfx)_1 (\mathbf y)_1 \\ (\mathbfx)_2 (\mathbf y)_2 \\ \vdots \end\right, where \circ is the element-wise ( Hadamard) product. Such computations have been used to efficiently compute
polynomial kernel In machine learning, the polynomial kernel is a kernel function commonly used with support vector machines (SVMs) and other kernelized models, that represents the similarity of vectors (training samples) in a feature space over polynomials of th ...
s and many other .Woodruff, David P. "Sketching as a Tool for Numerical Linear Algebra." Theoretical Computer Science 10.1-2 (2014): 1-157. In 2020 it was shown that if the matrices C_1, C_2, \dots, C_c are independent \pm1 or Gaussian matrices, the combined matrix C_1 \bullet \dots \bullet C_c satisfies the distributional JL lemma if the number of rows is at least :O(\epsilon^\log1/\delta + \epsilon^(\tfrac1c\log1/\delta)^c). For large \epsilon this is as good as the completely random Johnson-Lindenstrauss, but a matching lower bound in the same paper shows that this exponential dependency on (\log1/\delta)^c is necessary. Alternative JL constructions are suggested to circumvent this.


See also

* Random projection *
Restricted isometry property In linear algebra, the restricted isometry property (RIP) characterizes matrices which are nearly orthonormal, at least when operating on sparse vectors. The concept was introduced by Emmanuel Candès and Terence TaoE. J. Candes and T. Tao, "Deco ...


Notes


References


Further reading

*. Journal version of a paper previously appearing at PODC 2001. *. *. * * * {{DEFAULTSORT:Johnson-Lindenstrauss lemma Lemmas Metric geometry