In
information theory
Information theory is the mathematical study of the quantification (science), quantification, Data storage, storage, and telecommunications, communication of information. The field was established and formalized by Claude Shannon in the 1940s, ...
,
linguistics
Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...
, and
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, the Levenshtein distance is a
string metric
In mathematics and computer science, a string metric (also known as a string similarity metric or string distance function) is a metric (mathematics), metric that measures distance ("inverse similarity") between two string (computer science), tex ...
for measuring the difference between two sequences. The Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. It is named after Soviet mathematician
Vladimir Levenshtein
Vladimir Iosifovich Levenshtein ( rus, Влади́мир Ио́сифович Левенште́йн, p=vlɐˈdʲimʲɪr ɨˈosʲɪfəvʲɪtɕ lʲɪvʲɪnˈʂtʲejn, a=Ru-Vladimir Iosifovich Levenstein.oga; 20 May 1935 – 6 September 2017) was ...
, who defined the metric in 1965.
Levenshtein distance may also be referred to as ''edit distance'', although that term may also denote a larger family of distance metrics known collectively as
edit distance
In computational linguistics and computer science, edit distance is a string metric, i.e. a way of quantifying how dissimilar two String (computing), strings (e.g., words) are to one another, that is measured by counting the minimum number of opera ...
.
It is closely related to
pairwise string alignments.
Definition
The Levenshtein distance between two strings
(of length
and
respectively) is given by
where
:
where the
of some string
is a string of all but the first character of
(i.e.
), and
is the first character of
(i.e.
). Either the notation