HOME

TheInfoList



OR:

In the mathematical theory of
stochastic processes In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that a ...
, variable-order Markov (VOM) models are an important class of models that extend the well known
Markov chain A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happen ...
models. In contrast to the Markov chain models, where each
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...
in a sequence with a
Markov property In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process. It is named after the Russian mathematician Andrey Markov. The term strong Markov property is similar to the Markov prop ...
depends on a fixed number of random variables, in VOM models this number of conditioning random variables may vary based on the specific observed realization. This realization sequence is often called the ''context''; therefore the VOM models are also called ''context trees''. VOM models are nicely rendered by colorized probabilistic suffix trees (PST). The flexibility in the number of conditioning random variables turns out to be of real advantage for many applications, such as
statistical analysis Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers propertie ...
,
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...
and
prediction A prediction (Latin ''præ-'', "before," and ''dicere'', "to say"), or forecast, is a statement about a future event or data. They are often, but not always, based upon experience or knowledge. There is no universal agreement about the exac ...
.


Example

Consider for example a sequence of
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...
s, each of which takes a value from the ternary
alphabet An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a s ...
. Specifically, consider the string ' constructed from infinite concatenations of the sub-string '. The VOM model of maximal order 2 can approximate the above string using ''only'' the following five
conditional probability In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occu ...
components: . In this example, Pr(''c'', ''ab'') = Pr(''c'', ''b'') = 1.0; therefore, the shorter context ''b'' is sufficient to determine the next character. Similarly, the VOM model of maximal order 3 can generate the string exactly using only five conditional probability components, which are all equal to 1.0. To construct the
Markov chain A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happen ...
of order 1 for the next character in that string, one must estimate the following 9 conditional probability components: . To construct the Markov chain of order 2 for the next character, one must estimate 27 conditional probability components: . And to construct the Markov chain of order three for the next character one must estimate the following 81 conditional probability components: . In practical settings there is seldom sufficient data to accurately estimate the exponentially increasing number of conditional probability components as the order of the Markov chain increases. The variable-order Markov model assumes that in realistic settings, there are certain realizations of states (represented by contexts) in which some past states are independent from the future states; accordingly, "a great reduction in the number of model parameters can be achieved."


Definition

Let be a state space (finite
alphabet An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a s ...
) of size , A, . Consider a sequence with the
Markov property In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process. It is named after the Russian mathematician Andrey Markov. The term strong Markov property is similar to the Markov prop ...
x_1^=x_1x_2\dots x_n of realizations of
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...
s, where x_i\in A is the state (symbol) at position \scriptstyle (1 \le i \le n), and the concatenation of states x_i and x_ is denoted by x_ix_. Given a training set of observed states, x_1^, the construction algorithm of the VOM models learns a model that provides a
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
assignment for each state in the sequence given its past (previously observed symbols) or future states. Specifically, the learner generates a
conditional probability distribution In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the c ...
P(x_i\mid s) for a symbol x_i \in A given a context s\in A^*, where the * sign represents a sequence of states of any length, including the empty context. VOM models attempt to estimate
conditional distribution In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the c ...
s of the form P(x_i\mid s) where the context length , s, \le D varies depending on the available statistics. In contrast, conventional
Markov models In probability theory, a Markov model is a stochastic model used to model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it (that is, it assumes the Marko ...
attempt to estimate these
conditional distribution In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the c ...
s by assuming a fixed contexts' length , s, = D and, hence, can be considered as special cases of the VOM models. Effectively, for a given training sequence, the VOM models are found to obtain better model parameterization than the fixed-order
Markov models In probability theory, a Markov model is a stochastic model used to model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it (that is, it assumes the Marko ...
that leads to a better
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
-bias tradeoff of the learned models.


Application areas

Various efficient algorithms have been devised for estimating the parameters of the VOM model. VOM models have been successfully applied to areas such as
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
,
information theory Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
and
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combin ...
, including specific applications such as coding and
data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...
, document compression, classification and identification of DNA and protein sequences

ref name="Shmilovici"/>
statistical process control Statistical process control (SPC) or statistical quality control (SQC) is the application of statistical methods to monitor and control the quality of a production process. This helps to ensure that the process operates efficiently, producing ...
,
spam filtering Various anti-spam techniques are used to prevent email spam (unsolicited bulk email). No technique is a complete solution to the spam problem, and each has trade-offs between incorrectly rejecting legitimate email ( false positives) as opposed t ...
, haplotyping, speech recognition,
sequence analysis in social sciences In social sciences, sequence analysis (SA) is concerned with the analysis of sets of categorical sequences that typically describe longitudinal data. Analyzed sequences are encoded representations of, for example, individual life trajectories such ...
, and others.


See also

*
Stochastic chains with memory of variable length Stochastic chains with memory of variable length are a family of stochastic chains of finite order in a finite alphabet, such as, for every time pass, only one finite suffix of the past, called context, is necessary to predict the next symbol. The ...
* Examples of Markov chains * Variable order Bayesian network *
Markov process A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happen ...
*
Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
* Semi-Markov process *
Artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...


References

{{reflist Markov models