The intrinsic dimension for a data set can be thought of as the minimal number of variables needed to represent the data set. Similarly, in

signal processing Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as audio signal processing, sound, image processing, images, Scalar potential, potential fields, Seismic tomograph ...

of multidimensional signals, the intrinsic dimension of the signal describes how many variables are needed to generate a good approximation of the signal. When estimating intrinsic dimension, however, a slightly broader definition based on manifold dimension is often used, where a representation in the intrinsic dimension does only need to exist locally. Such intrinsic dimension estimation methods can thus handle data sets with different intrinsic dimensions in different parts of the data set. This is often referred to as local intrinsic dimensionality. The intrinsic dimension can be used as a lower bound of what dimension it is possible to compress a data set into through dimension reduction, but it can also be used as a measure of the complexity of the data set or signal. For a data set or signal of ''N'' variables, its intrinsic dimension ''M'' satisfies ''0 ≤ M ≤ N'', although estimators may yield higher values.

Example

Let ''

f(x_1, x_2)

'' be a two-variable function (or

signal A signal is both the process and the result of transmission of data over some media accomplished by embedding some variation. Signals are important in multiple subject fields including signal processing, information theory and biology. In ...

) which is of the form

f(x_1, x_2) = g(x_1)

for some one-variable function ''g'' which is not constant. This means that ''f'' varies, in accordance to ''g'', with the first variable or along the first

coordinate In geometry, a coordinate system is a system that uses one or more numbers, or coordinates, to uniquely determine and standardize the position of the points or other geometric elements on a manifold such as Euclidean space. The coordinates are ...

. On the other hand, ''f'' is constant with respect to the second variable or along the second coordinate. It is only necessary to know the value of one, namely the first, variable in order to determine the value of ''f''. Hence, it is a two-variable function but its intrinsic dimension is one. A slightly more complicated example is

f(x_1, x_2) = g(x_1 + x_2)

. ''f'' is still intrinsic one-dimensional, which can be seen by making a variable transformation

y_1 = x_1 + x_2

and

y_2 = x_1 - x_2

which gives

f\left(\frac, \frac\right) = g\left(y_1\right)

. Since the variation in ''f'' can be described by the single variable ''y₁'' its intrinsic dimension is one. For the case that ''f'' is constant, its intrinsic dimension is zero since no variable is needed to describe variation. For the general case, when the intrinsic dimension of the two-variable function ''f'' is neither zero or one, it is two. In the literature, functions which are of intrinsic dimension zero, one, or two are sometimes referred to as ''i0D'', ''i1D'' or ''i2D'', respectively.

Formal definition for signals

For an ''N''-variable function ''f'', the set of variables can be represented as an ''N''-dimensional vector x:

f = f\left(\mathbf \right) \text \mathbf = \left(x_1, \dots, x_N \right)

. If for some ''M''-variable function ''g'' and ''M × N'' matrix A it is the case that * for all x;

f(\mathbf) = g(\mathbf),

* ''M'' is the smallest number for which the above relation between ''f'' and ''g'' can be found, then the intrinsic dimension of ''f'' is ''M''. The intrinsic dimension is a characterization of ''f'', it is not an unambiguous characterization of ''g'' nor of A. That is, if the above relation is satisfied for some ''f'', ''g'', and A, it must also be satisfied for the same ''f'' and ''g′'' and A′ given by

g'\left(\mathbf\right) = g \left(\mathbf\right)

and

\mathbf = \mathbf^ \mathbf

where B is a non-singular ''M × M'' matrix, since

f\left(\mathbf\right) = 
g'\left(\mathbf\right) = g \left(\mathbf\right) = g\left(\mathbf\right)

The Fourier transform of signals of low intrinsic dimension

An ''N'' variable function which has intrinsic dimension ''M < N'' has a characteristic

Fourier transform In mathematics, the Fourier transform (FT) is an integral transform that takes a function as input then outputs another function that describes the extent to which various frequencies are present in the original function. The output of the tr ...

. Intuitively, since this type of function is constant along one or several dimensions its Fourier transform must appear like an impulse (the Fourier transform of a constant) along the same dimension in the

frequency domain In mathematics, physics, electronics, control systems engineering, and statistics, the frequency domain refers to the analysis of mathematical functions or signals with respect to frequency (and possibly phase), rather than time, as in time ser ...

A simple example

Let ''f'' be a two-variable function which is i1D. This means that there exists a normalized vector

\mathbf \in \reals^

and a one-variable function ''g'' such that

f(\mathbf) = g(\mathbf^ \mathbf)

for all

\mathbf \in \reals^

. If ''F'' is the Fourier transform of ''f'' (both are two-variable functions) it must be the case that

F \left(\mathbf\right) = G \left(\mathbf^ \mathbf\right) \cdot \delta \left(\mathbf^ \mathbf\right)

. Here ''G'' is the Fourier transform of ''g'' (both are one-variable functions), ''δ'' is the Dirac impulse function and m is a normalized vector in

\reals^

perpendicular to n. This means that ''F'' vanishes everywhere except on a line which passes through the origin of the frequency domain and is parallel to m. Along this line ''F'' varies according to ''G''.

The general case

Let ''f'' be an ''N''-variable function which has intrinsic dimension ''M'', that is, there exists an ''M''-variable function ''g'' and ''M × N'' matrix A such that

f(\mathbf) = g(\mathbf) \quad \forall \mathbf

. Its Fourier transform ''F'' can then be described as follows: * ''F'' vanishes everywhere except for a subspace of dimension ''M'' * The subspace ''M'' is spanned by the rows of the matrix A * In the subspace, ''F'' varies according to ''G'' the Fourier transform of ''g''

Generalizations

The type of intrinsic dimension described above assumes that a

linear transformation In mathematics, and more specifically in linear algebra, a linear map (also called a linear mapping, linear transformation, vector space homomorphism, or in some contexts linear function) is a mapping V \to W between two vector spaces that pr ...

is applied to the coordinates of the ''N''-variable function ''f'' to produce the ''M'' variables which are necessary to represent every value of ''f''. This means that ''f'' is constant along lines, planes, or hyperplanes, depending on ''N'' and ''M''. In a general case, ''f'' has intrinsic dimension ''M'' if there exist ''M'' functions ''a₁'', ''a₂'', ..., ''a_M'' and an ''M''-variable function ''g'' such that *

f(\mathbf) = g \left( a_1(\mathbf), a_2(\mathbf), \dots, a_M(\mathbf) \right)

for all x * ''M'' is the smallest number of functions which allows the above transformation A simple example is transforming a 2-variable function ''f'' to

polar coordinates In mathematics, the polar coordinate system specifies a given point (mathematics), point in a plane (mathematics), plane by using a distance and an angle as its two coordinate system, coordinates. These are *the point's distance from a reference ...

f\left(\frac, \frac\right) = g\left(y_1\right)

f(x_1, x_2) = g \left(\sqrt \right)

, ''f'' is i1D and is constant along any circle centered at the origin *

f(x_1, x_2) = g \left(\arctan \left(\frac\right)\right)

, ''f'' is i1D and is constant along all rays from the origin For the general case, a simple description of either the point sets for which ''f'' is constant or its Fourier transform is usually not possible.

Local Intrinsic Dimensionality

Local intrinsic dimensionality (LID) refers to the observation that often data is distributed on a lower-dimensional manifold when only considering a nearby subset of the data. For example the function

f(x,y) = x + \max\

can be considered one-dimensional when ''y'' is close to 0 (with one variable ''x''), two-dimensional when ''y'' is close to 1, and again one-dimensional when ''y'' is positive and much larger than 1 (with variable ''x+y''). Local intrinsic dimensionality is often used with respect to data. It then usually is estimated based on the ''k'' nearest neighbors of a data point, often based on a concept related to the doubling dimension in mathematics. Since the volume of a ''d''-sphere grows exponentially in ''d'', the rate at which new neighbors are found as the search radius is increased can be used to estimate the local intrinsic dimensionality (e.g., GED estimation). However, alternate approaches of estimation have been proposed, for example angle-based estimation.

Intrinsic dimension estimation

Intrinsic dimension of data manifolds can be estimated by many methods, depending on assumptions of the data manifold. A 2016 review is. The two-nearest neighbors (TwoNN) method is a method for estimating the intrinsic dimension of an immersed

Riemannian manifold In differential geometry, a Riemannian manifold is a geometric space on which many geometric notions such as distance, angles, length, volume, and curvature are defined. Euclidean space, the N-sphere, n-sphere, hyperbolic space, and smooth surf ...

. The algorithm is as follows:

Scatter some points on the manifold. Measure $\mu = r_2/r_1$ for many points, where $r_1, r_2$ are the distances to the point's two closest neighbors. Fit the empirical CDF of $\mu$ to $1-\mu^$ . Return $d$ .

History

During the 1950s so called "scaling" methods were developed in the

social sciences Social science (often rendered in the plural as the social sciences) is one of the branches of science, devoted to the study of society, societies and the Social relation, relationships among members within those societies. The term was former ...

to explore and summarize multidimensional data sets. After Shepard introduced non-metric multidimensional scaling in 1962 one of the major research areas within multi-dimensional scaling (MDS) was estimation of the intrinsic dimension. The topic was also studied in

information theory Information theory is the mathematical study of the quantification (science), quantification, Data storage, storage, and telecommunications, communication of information. The field was established and formalized by Claude Shannon in the 1940s, ...

, pioneered by Bennet in 1965 who coined the term "intrinsic dimension" and wrote a computer program to estimate it. During the 1970s intrinsic dimensionality estimation methods were constructed that did not depend on dimensionality reductions such as MDS: based on local eigenvalues., based on distance distributions, and based on other dimension-dependent geometric properties Estimating intrinsic dimension of sets and probability measures has also been extensively studied since around 1980 in the field of dynamical systems, where dimensions of (strange) attractors have been the subject of interest. For strange attractors there is no manifold assumption, and the dimension measured is some version of fractal dimension — which also can be non-integer. However, definitions of fractal dimension yield the manifold dimension for manifolds. In the 2000s the "curse of dimensionality" has been exploited to estimate intrinsic dimension.

Applications

The case of a two-variable signal which is i1D appears frequently in

computer vision Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...

and

image processing An image or picture is a visual representation. An image can be two-dimensional, such as a drawing, painting, or photograph, or three-dimensional, such as a carving or sculpture. Images may be displayed through other media, including a pr ...

and captures the idea of local image regions which contain lines or edges. The analysis of such regions has a long history, but it was not until a more formal and theoretical treatment of such operations began that the concept of intrinsic dimension was established, even though the name has varied. For example, the concept which here is referred to as an ''image neighborhood of intrinsic dimension 1'' or ''i1D neighborhood'' is called ''1-dimensional'' by Knutsson (1982), ''linear symmetric'' by Bigün & Granlund (1987) and ''simple neighborhood'' in Granlund & Knutsson (1995).

References

* {{cite journal , author = Michael Felsberg , author2=Sinan Kalkan , author3=Norbert Krueger , title = Continuous Dimensionality Characterization of Image Structures , journal = Image and Vision Computing , volume = 27 , issue = 6 , pages = 628–636 , year = 2009 , doi=10.1016/j.imavis.2008.06.018, url=http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-18087 , hdl = 11511/36631 , hdl-access = free Computer vision Image processing