In the mathematical theory of

stochastic processes In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables in a probability space, where the index of the family often has the interpretation of time. Stoc ...

, variable-order Markov (VOM) models are an important class of models that extend the well known

Markov chain In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally ...

models. In contrast to the Markov chain models, where each

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

in a sequence with a

Markov property In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process, which means that its future evolution is independent of its history. It is named after the Russian mathematician Andrey Ma ...

depends on a fixed number of random variables, in VOM models this number of conditioning random variables may vary based on the specific observed realization. This realization sequence is often called the ''context''; therefore the VOM models are also called ''context trees''. VOM models are nicely rendered by colorized probabilistic suffix trees (PST). The flexibility in the number of conditioning random variables turns out to be of real advantage for many applications, such as

statistical analysis Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of ...

classification Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identif ...

and

prediction A prediction (Latin ''præ-'', "before," and ''dictum'', "something said") or forecast is a statement about a future event or about future data. Predictions are often, but not always, based upon experience or knowledge of forecasters. There ...

Example

Consider for example a sequence of

s, each of which takes a value from the ternary

alphabet An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...

. Specifically, consider the string constructed from infinite concatenations of the sub-string : . The VOM model of maximal order 2 can approximate the above string using ''only'' the following five

conditional probability In probability theory, conditional probability is a measure of the probability of an Event (probability theory), event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This ...

components: , , , , . In this example, ; therefore, the shorter context is sufficient to determine the next character. Similarly, the VOM model of maximal order 3 can generate the string exactly using only five conditional probability components, which are all equal to 1.0. To construct the

of order 1 for the next character in that string, one must estimate the following 9 conditional probability components: , , , , , , , , . To construct the Markov chain of order 2 for the next character, one must estimate 27 conditional probability components: , , , . And to construct the Markov chain of order three for the next character one must estimate the following 81 conditional probability components: , , , . In practical settings there is seldom sufficient data to accurately estimate the exponentially increasing number of conditional probability components as the order of the Markov chain increases. The variable-order Markov model assumes that in realistic settings, there are certain realizations of states (represented by contexts) in which some past states are independent from the future states; accordingly, "a great reduction in the number of model parameters can be achieved."

Definition

Let be a state space (finite

) of size

, A,

. Consider a sequence with the

x_1^=x_1x_2\dots x_n

of realizations of

s, where

x_i\in  A

is the state (symbol) at position

\scriptstyle (1 \le i \le n)

, and the concatenation of states

x_i

and

x_

is denoted by

x_ix_

. Given a training set of observed states,

x_1^

, the construction algorithm of the VOM models learns a model that provides a

probability Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...

assignment for each state in the sequence given its past (previously observed symbols) or future states. Specifically, the learner generates a

conditional probability distribution In probability theory and statistics, the conditional probability distribution is a probability distribution that describes the probability of an outcome given the occurrence of a particular event. Given two jointly distributed random variables X ...

P(x_i\mid s)

for a symbol

x_i \in A

given a context

s\in A^*

, where the * sign represents a sequence of states of any length, including the empty context. VOM models attempt to estimate

conditional distribution Conditional (if then) may refer to: * Causal conditional, if X then Y, where X is a cause of Y *Conditional probability, the probability of an event A given that another event B * Conditional proof, in logic: a proof that asserts a conditional, ...

s of the form

P(x_i\mid s)

where the context length

, s,  \le D

varies depending on the available statistics. In contrast, conventional

Markov models In probability theory, a Markov model is a stochastic model used to Mathematical model, model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it (that is, ...

attempt to estimate these

s by assuming a fixed contexts' length

, s,  = D

and, hence, can be considered as special cases of the VOM models. Effectively, for a given training sequence, the VOM models are found to obtain better model parameterization than the fixed-order

that leads to a better

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

-bias tradeoff of the learned models.

Application areas

Various efficient algorithms have been devised for estimating the parameters of the VOM model. VOM models have been successfully applied to areas such as

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

information theory Information theory is the mathematical study of the quantification (science), quantification, Data storage, storage, and telecommunications, communication of information. The field was established and formalized by Claude Shannon in the 1940s, ...

and

bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...

, including specific applications such as coding and

data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...

, document compression, classification and identification of

DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...

and

protein sequences Protein primary structure is the linear sequence of amino acids in a peptide or protein. By convention, the primary structure of a protein is reported starting from the amino-terminal (N) end to the carboxyl-terminal (C) end. Protein biosynthes ...

ref name="Shmilovici"/>

statistical process control Statistical process control (SPC) or statistical quality control (SQC) is the application of statistics, statistical methods to monitor and control the quality of a production process. This helps to ensure that the process operates efficiently, ...

spam filtering Spam most often refers to: * Spam (food), a consumer brand product of canned processed pork of the Hormel Foods Corporation * Spamming, unsolicited or undesired electronic messages ** Email spam, unsolicited, undesired, or illegal email messages ...

haplotyping A haplotype (haploid genotype) is a group of alleles in an organism that are inherited together from a single parent. Many organisms contain genetic material (DNA) which is inherited from two parents. Normally these organisms have their DNA orga ...

, speech recognition,

sequence analysis in social sciences In social sciences, sequence analysis (SA) is concerned with the analysis of sets of categorical sequences that typically describe longitudinal data. Analyzed sequences are encoded representations of, for example, individual life trajectories suc ...

, and others.

References

{{reflist Markov models

Example

Definition

Application areas

See also

References