In
data analysis
Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, en ...
, the self-similarity matrix is a graphical representation of
similar sequences in a data series.
Similarity can be explained by different measures, like spatial distance (
distance matrix
In mathematics, computer science and especially graph theory, a distance matrix is a square matrix
In mathematics, a square matrix is a matrix with the same number of rows and columns. An ''n''-by-''n'' matrix is known as a square matrix of orde ...
),
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statisti ...
, or comparison of local
histogram
A histogram is an approximate representation of the distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to " bin" (or " bucket") the range of values—that is, divide the ent ...
s or
spectral properties
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
(e.g. IXEGRAM). This technique is also applied for the search of a given pattern in a long data series as in
gene matching. A similarity plot can be the starting point for
dot plots or
recurrence plots.
Definition
To construct a self-similarity matrix, one first transforms a data series into an ordered sequence of
feature vector
In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating and independent features is a crucial element of effective algorithms in pattern ...
s
, where each vector
describes the relevant features of a data series in a given local interval. Then the self-similarity matrix is formed by computing the similarity of pairs of feature vectors
:
where
is a function measuring the similarity of the two vectors, for instance, the
inner product
In mathematics, an inner product space (or, rarely, a Hausdorff pre-Hilbert space) is a real vector space or a complex vector space with an operation called an inner product. The inner product of two vectors in the space is a scalar, often ...
. Then similar segments of feature vectors will show up as path of high similarity along diagonals of the matrix.
Similarity plots are used for action recognition that is invariant to point of view
and for audio segmentation using
spectral clustering
In multivariate statistics, spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. The similarity matrix is provided a ...
of the self-similarity matrix.
Example
See also
*
Recurrence plot In descriptive statistics and chaos theory, a recurrence plot (RP) is a plot showing, for each moment i in time, the times at which the state of a dynamical system returns to the previous state at i,
i.e., when the phase space trajectory visits rou ...
*
Distance matrix
In mathematics, computer science and especially graph theory, a distance matrix is a square matrix
In mathematics, a square matrix is a matrix with the same number of rows and columns. An ''n''-by-''n'' matrix is known as a square matrix of orde ...
*
Similarity matrix
In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such mea ...
*
Substitution matrix
In bioinformatics and evolutionary biology, a substitution matrix describes the frequency at which a character in a nucleotide sequence or a protein sequence changes to other character states over evolutionary time. The information is often in ...
*
Dot plot (bioinformatics)
In bioinformatics a dot plot is a graphical method for comparing two biological sequences and identifying regions of close similarity after sequence alignment. It is a type of recurrence plot.
History
One way to visualize the similarity between ...
References
Further reading
*
*
* {{cite book
, author=M. A. Casey
, title=Sound Classification and Similarity Tools
, publisher=J. Wiley
, year=2002
, pages=309–323
, editor1=B.S. Manjunath , editor2=P. Salembier , editor3=T. Sikora
, journal=Introduction to MPEG-7: Multimedia Content Description Language
, isbn=978-0471486787
External links
* http://www.recurrence-plot.tk/related_methods.php
Statistical charts and diagrams
Visualization (graphics)