The Viterbi algorithm is a
dynamic programming
Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics.
I ...
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
for obtaining the
maximum a posteriori probability estimate of the most
likely
Likely may refer to:
*Probability
*Likelihood function
*Likely (surname)
*Likely, British Columbia, Canada, a community
* Likely, California, United States, a census-designated place
* Likely McBrien (1892-1956), leading Australian rules football ...
sequence of hidden states—called the Viterbi path—that results in a sequence of observed events, especially in the context of
Markov information source
In mathematics, a Markov information source, or simply, a Markov source, is an information source whose underlying dynamics are given by a stationary finite Markov chain.
Formal definition
An information source is a sequence of random variables ...
s and
hidden Markov model
A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...
s (HMM).
The algorithm has found universal application in decoding the
convolutional codes used in both
CDMA
Code-division multiple access (CDMA) is a channel access method used by various radio communication technologies. CDMA is an example of multiple access, where several transmitters can send information simultaneously over a single communication ...
and
GSM digital cellular,
dial-up
Dial-up Internet access is a form of Internet access that uses the facilities of the public switched telephone network (PSTN) to establish a connection to an Internet service provider (ISP) by dialing a telephone number on a conventional telepho ...
modems, satellite, deep-space communications, and
802.11 wireless LANs. It is now also commonly used in
speech recognition
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the ma ...
,
speech synthesis
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal languag ...
,
diarization,
keyword spotting,
computational linguistics
Computational linguistics is an Interdisciplinarity, interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, comput ...
, and
bioinformatics
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combin ...
. For example, in
speech-to-text (speech recognition), the acoustic signal is treated as the observed sequence of events, and a string of text is considered to be the "hidden cause" of the acoustic signal. The Viterbi algorithm finds the most likely string of text given the acoustic signal.
History
The Viterbi algorithm is named after
Andrew Viterbi, who proposed it in 1967 as a decoding algorithm for
convolutional codes over noisy digital communication links. It has, however, a history of
multiple invention, with at least seven independent discoveries, including those by Viterbi,
Needleman and Wunsch, and
Wagner and Fischer.
It was introduced to
Natural Language Processing
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
as a method of
part-of-speech tagging as early as 1987.
''Viterbi path'' and ''Viterbi algorithm'' have become standard terms for the application of dynamic programming algorithms to maximization problems involving probabilities.
For example, in
statistical parsing a dynamic programming algorithm can be used to discover the single most likely context-free derivation (parse) of a string, which is commonly called the "Viterbi parse". Another application is in
target tracking, where the track is computed that assigns a maximum likelihood to a sequence of observations.
Extensions
A generalization of the Viterbi algorithm, termed the ''max-sum algorithm'' (or ''max-product algorithm'') can be used to find the most likely assignment of all or some subset of
latent variable
In statistics, latent variables (from Latin: present participle of ''lateo'', “lie hidden”) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or me ...
s in a large number of
graphical model
A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability ...
s, e.g.
Bayesian network
A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bay ...
s,
Markov random fields and
conditional random fields. The latent variables need, in general, to be connected in a way somewhat similar to a
hidden Markov model
A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...
(HMM), with a limited number of connections between variables and some type of linear structure among the variables. The general algorithm involves ''message passing'' and is substantially similar to the
belief propagation algorithm (which is the generalization of the
forward-backward algorithm).
With the algorithm called
iterative Viterbi decoding one can find the subsequence of an observation that matches best (on average) to a given hidden Markov model. This algorithm is proposed by Qi Wang et al. to deal with
turbo code
In information theory, turbo codes (originally in French ''Turbocodes'') are a class of high-performance forward error correction (FEC) codes developed around 1990–91, but first published in 1993. They were the first practical codes to closely ...
. Iterative Viterbi decoding works by iteratively invoking a modified Viterbi algorithm, reestimating the score for a filler until convergence.
An alternative algorithm, the
Lazy Viterbi algorithm, has been proposed. For many applications of practical interest, under reasonable noise conditions, the lazy decoder (using Lazy Viterbi algorithm) is much faster than the original
Viterbi decoder (using Viterbi algorithm). While the original Viterbi algorithm calculates every node in the
trellis
Trellis may refer to:
Structures
* Trellis (architecture), an architectural structure often used to support plants (especially vineyards)
* Trellis drainage pattern, a drainage system
Technology
* Trellis (graph), a special kind of graph used ...
of possible outcomes, the Lazy Viterbi algorithm maintains a prioritized list of nodes to evaluate in order, and the number of calculations required is typically fewer (and never more) than the ordinary Viterbi algorithm for the same result. However, it is not so easy to parallelize in hardware.
Pseudocode
This algorithm generates a path
, which is a sequence of states
that generate the observations
with
, where
is the number of possible observations in the observation space
.
Two 2-dimensional tables of size
are constructed:
* Each element