In the mathematical theory of
stochastic processes
In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appe ...
, variable-order Markov (VOM) models are an important class of models that extend the well known
Markov chain
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...
models. In contrast to the Markov chain models, where each
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
in a sequence with a
Markov property depends on a fixed number of random variables, in VOM models this number of conditioning random variables may vary based on the specific observed realization.
This realization sequence is often called the ''context''; therefore the VOM models are also called ''context trees''.
VOM models are nicely rendered by colorized probabilistic suffix trees (PST).
The flexibility in the number of conditioning random variables turns out to be of real advantage for many applications, such as
statistical analysis
Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers propertie ...
,
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood.
Classification is the grouping of related facts into classes.
It may also refer to:
Business, organizat ...
and
prediction
A prediction (Latin ''præ-'', "before," and ''dicere'', "to say"), or forecast, is a statement about a future event or data. They are often, but not always, based upon experience or knowledge. There is no universal agreement about the exact ...
.
Example
Consider for example a sequence of
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
s, each of which takes a value from the ternary
alphabet
An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a syll ...
. Specifically, consider the string ' constructed from infinite concatenations of the sub-string '.
The VOM model of maximal order 2 can approximate the above string using ''only'' the following five
conditional probability components: .
In this example, Pr(''c'', ''ab'') = Pr(''c'', ''b'') = 1.0; therefore, the shorter context ''b'' is sufficient to determine the next character. Similarly, the VOM model of maximal order 3 can generate the string exactly using only five conditional probability components, which are all equal to 1.0.
To construct the
Markov chain
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...
of order 1 for the next character in that string, one must estimate the following 9 conditional probability components: . To construct the Markov chain of order 2 for the next character, one must estimate 27 conditional probability components: . And to construct the Markov chain of order three for the next character one must estimate the following 81 conditional probability components: .
In practical settings there is seldom sufficient data to accurately estimate the
exponentially increasing number of conditional probability components as the order of the Markov chain increases.
The variable-order Markov model assumes that in realistic settings, there are certain realizations of states (represented by contexts) in which some past states are independent from the future states; accordingly, "a great reduction in the number of model parameters can be achieved."
Definition
Let be a state space (finite
alphabet
An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a syll ...
) of size
.
Consider a sequence with the
Markov property of realizations of
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
s, where
is the state (symbol) at position
, and the concatenation of states
and
is denoted by
.
Given a training set of observed states,
, the construction algorithm of the VOM models
learns a model that provides a
probability
Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
assignment for each state in the sequence given its past (previously observed symbols) or future states.
Specifically, the learner generates a
conditional probability distribution
In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the co ...
for a symbol
given a context
, where the * sign represents a sequence of states of any length, including the empty context.
VOM models attempt to estimate
conditional distribution
In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the co ...
s of the form
where the context length
varies depending on the available statistics.
In contrast, conventional
Markov models
In probability theory, a Markov model is a stochastic model used to model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it (that is, it assumes the Marko ...
attempt to estimate these
conditional distribution
In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the co ...
s by assuming a fixed contexts' length
and, hence, can be considered as special cases of the VOM models.
Effectively, for a given training sequence, the VOM models are found to obtain better model parameterization than the fixed-order
Markov models
In probability theory, a Markov model is a stochastic model used to model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it (that is, it assumes the Marko ...
that leads to a better
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
-bias tradeoff of the learned models.
Application areas
Various efficient algorithms have been devised for estimating the parameters of the VOM model.
VOM models have been successfully applied to areas such as
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
,
information theory
Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...
and
bioinformatics
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
, including specific applications such as
coding and
data compression,
document compression,
classification and identification of
DNA and
protein sequences
Protein primary structure is the linear sequence of amino acids in a peptide or protein. By convention, the primary structure of a protein is reported starting from the amino-terminal (N) end to the carboxyl-terminal (C) end. Protein biosynthes ...
ref name="Shmilovici"/>
statistical process control,
spam filtering,
haplotyping, speech recognition,
sequence analysis in social sciences,
and others.
See also
*
Stochastic chains with memory of variable length
Stochastic chains with memory of variable length are a family of stochastic chains of finite order in a finite alphabet, such as, for every time pass, only one finite suffix of the past, called context, is necessary to predict the next symbol. The ...
*
Examples of Markov chains
*
Variable order Bayesian network Variable-order Bayesian network (VOBN) models provide an important extension of both the Bayesian network models and the variable-order Markov models. VOBN models are used in machine learning in general and have shown great potential in bioinformat ...
*
Markov process
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...
*
Markov chain Monte Carlo
In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
*
Semi-Markov process
In probability and statistics, a Markov renewal process (MRP) is a random process that generalizes the notion of Markov jump processes. Other random processes like Markov chains, Poisson processes and renewal processes can be derived as special ...
*
Artificial intelligence
Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...
References
{{reflist
Markov models