In the theory of computation, a branch of

theoretical computer science Theoretical computer science is a subfield of computer science and mathematics that focuses on the Abstraction, abstract and mathematical foundations of computation. It is difficult to circumscribe the theoretical areas precisely. The Associati ...

, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automaton (DFSA)—is a

finite-state machine A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...

that accepts or rejects a given

string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...

of symbols, by running through a state sequence uniquely determined by the string. ''Deterministic'' refers to the uniqueness of the computation run. In search of the simplest models to capture finite-state machines,

Warren McCulloch Warren Sturgis McCulloch (November 16, 1898 – September 24, 1969) was an American neurophysiologist and cybernetician known for his work on the foundation for certain brain theories and his contribution to the cybernetics movement.Ken Aizawa ...

and

Walter Pitts Walter Harry Pitts, Jr. (April 23, 1923 – May 14, 1969) was an American logician who worked in the field of computational neuroscience.Smalheiser, Neil R"Walter Pitts", ''Perspectives in Biology and Medicine'', Volume 43, Number 2, Wint ...

were among the first researchers to introduce a concept similar to finite automata in 1943. The figure illustrates a deterministic finite automaton using a state diagram. In this example automaton, there are three states: S₀, S₁, and S₂ (denoted graphically by circles). The automaton takes a finite

sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...

of 0s and 1s as input. For each state, there is a transition arrow leading out to a next state for both 0 and 1. Upon reading a symbol, a DFA jumps ''deterministically'' from one state to another by following the transition arrow. For example, if the automaton is currently in state S₀ and the current input symbol is 1, then it deterministically jumps to state S₁. A DFA has a ''start state'' (denoted graphically by an arrow coming in from nowhere) where computations begin, and a

set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...

of ''accept states'' (denoted graphically by a double circle) which help define when a computation is successful. A DFA is defined as an abstract mathematical concept, but is often implemented in hardware and software for solving various specific problems such as

lexical analysis Lexical tokenization is conversion of a text into (semantically or syntactically) meaningful ''lexical tokens'' belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives ...

and

pattern matching In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually must be exact: "either it will or will not be a ...

. For example, a DFA can model software that decides whether or not online user input such as email addresses are syntactically valid. DFAs have been generalized to '' nondeterministic finite automata (NFA)'' which may have several arrows of the same label starting from a state. Using the powerset construction method, every NFA can be translated to a DFA that recognizes the same language. DFAs, and NFAs as well, recognize exactly the set of

regular language In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...

Formal definition

A deterministic finite automaton is a 5-

tuple In mathematics, a tuple is a finite sequence or ''ordered list'' of numbers or, more generally, mathematical objects, which are called the ''elements'' of the tuple. An -tuple is a tuple of elements, where is a non-negative integer. There is o ...

, , consisting of * a finite

states State most commonly refers to: * State (polity), a centralized political organization that regulates law and society within a territory **Sovereign state, a sovereign polity in international law, commonly referred to as a country **Nation state, a ...

* a finite set of input symbols called the

alphabet An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...

* a transition function * an initial (or start) state

q_0 \in Q

* a set of accepting (or final) states

F \subseteq Q

Let be a string over the alphabet . The automaton accepts the string if a sequence of states, , exists in with the following conditions: # # , for #

r_n \in F

. In words, the first condition says that the machine starts in the start state . The second condition says that given each character of string , the machine will transition from state to state according to the transition function . The last condition says that the machine accepts if the last input of causes the machine to halt in one of the accepting states. Otherwise, it is said that the automaton ''rejects'' the string. The set of strings that accepts is the

language Language is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and signed language, signed forms, and may also be conveyed through writing syste ...

''recognized'' by and this language is denoted by . A deterministic finite automaton without accept states and without a starting state is known as a

transition system In theoretical computer science, a transition system is a concept used in the study of computation. It is used to describe the potential behavior of discrete systems. It consists of states and transitions between states, which may be labeled wi ...

semiautomaton In mathematics and theoretical computer science, a semiautomaton is a deterministic finite automaton having inputs but no output. It consists of a set ''Q'' of states, a set Σ called the input alphabet, and a function ''T'': ''Q'' × Σ → ''Q'' c ...

. For more comprehensive introduction of the formal definition see

automata theory Automata theory is the study of abstract machines and automata, as well as the computational problems that can be solved using them. It is a theory in theoretical computer science with close connections to cognitive science and mathematical l ...

Example

The following example is of a DFA , with a binary alphabet, which requires that the input contains an even number of 0s. DFAexample

where * * * * and * is defined by the following state transition table: : The state represents that there has been an even number of 0s in the input so far, while signifies an odd number. A 1 in the input does not change the state of the automaton. When the input ends, the state will show whether the input contained an even number of 0s or not. If the input did contain an even number of 0s, will finish in state , an accepting state, so the input string will be accepted. The language recognized by is the

given by the

regular expression A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...

(1*) (0 (1*) 0 (1*))*, where * is the

Kleene star In mathematical logic and theoretical computer science, the Kleene star (or Kleene operator or Kleene closure) is a unary operation on a Set (mathematics), set to generate a set of all finite-length strings that are composed of zero or more repe ...

, e.g., 1* denotes any number (possibly zero) of consecutive ones.

Variations

Complete and incomplete

According to the above definition, deterministic finite automata are always ''complete'': they define from each state a transition for each input symbol. While this is the most common definition, some authors use the term deterministic finite automaton for a slightly different notion: an automaton that defines ''at most'' one transition for each state and each input symbol; the transition function is allowed to be

partial Partial may refer to: Mathematics *Partial derivative, derivative with respect to one of several variables of a function, with the other variables held constant ** ∂, a symbol that can denote a partial derivative, sometimes pronounced "partial d ...

. When no transition is defined, such an automaton halts.

Local automata

A local automaton is a DFA, not necessarily complete, for which all edges with the same label lead to a single vertex. Local automata accept the class of local languages, those for which membership of a word in the language is determined by a "sliding window" of length two on the word. A Myhill graph over an alphabet ''A'' is a directed graph with vertex set ''A'' and subsets of vertices labelled "start" and "finish". The language accepted by a Myhill graph is the set of directed paths from a start vertex to a finish vertex: the graph thus acts as an automaton. The class of languages accepted by Myhill graphs is the class of local languages.

Randomness

When the start state and accept states are ignored, a DFA of states and an alphabet of size can be seen as a digraph of vertices in which all vertices have out-arcs labeled (a -out digraph). It is known that when is a fixed integer, with high probability, the largest

strongly connected component In the mathematics, mathematical theory of directed graphs, a graph is said to be strongly connected if every vertex is reachability, reachable from every other vertex. The strongly connected components of a directed graph form a partition of a s ...

(SCC) in such a -out digraph chosen uniformly at random is of linear size and it can be reached by all vertices. It has also been proven that if is allowed to increase as increases, then the whole digraph has a phase transition for strong connectivity similar to

Erdős–Rényi model In the mathematical field of graph theory, the Erdős–Rényi model refers to one of two closely related models for generating random graphs or the evolution of a random network. These models are named after Hungarians, Hungarian mathematicians ...

for connectivity. In a random DFA, the maximum number of vertices reachable from one vertex is very close to the number of vertices in the largest SCC with high probability. This is also true for the largest induced sub-digraph of minimum in-degree one, which can be seen as a directed version of -core.

Closure properties

If DFAs recognize the languages that are obtained by applying an operation on the DFA recognizable languages then DFAs are said to be closed under the operation. The DFAs are closed under the following operations. For each operation, an optimal construction with respect to the number of states has been determined in state complexity research. Since DFAs are equivalent to nondeterministic finite automata (NFA), these closures may also be proved using closure properties of NFA.

As a transition monoid

A run of a given DFA can be seen as a sequence of compositions of a very general formulation of the transition function with itself. Here we construct that function. For a given input symbol

a \in \Sigma

, one may construct a transition function

\delta_a : Q \rightarrow Q

by defining

\delta_a(q) = \delta(q,a)

for all

q \in Q

. (This trick is called

currying In mathematics and computer science, currying is the technique of translating a function that takes multiple arguments into a sequence of families of functions, each taking a single argument. In the prototypical example, one begins with a functi ...

.) From this perspective,

\delta_a

"acts" on a state in Q to yield another state. One may then consider the result of

function composition In mathematics, the composition operator \circ takes two function (mathematics), functions, f and g, and returns a new function h(x) := (g \circ f) (x) = g(f(x)). Thus, the function is function application, applied after applying to . (g \c ...

repeatedly applied to the various functions

\delta_a

\delta_b

, and so on. Given a pair of letters

a, b \in \Sigma

, one may define a new function

\widehat\delta_=\delta_a \circ \delta_b

, where

\circ

denotes function composition. Clearly, this process may be recursively continued, giving the following recursive definition of

\widehat\delta : Q \times \Sigma^ \rightarrow Q

: :

\widehat\delta ( q, \epsilon ) = q

, where

\epsilon

is the empty string and :

\widehat\delta ( q, wa ) = \delta_a(\widehat\delta ( q, w ))

, where

w \in \Sigma ^*, a \in \Sigma

and

q \in Q

\widehat\delta

is defined for all words

w\in\Sigma^*

. A run of the DFA is a sequence of compositions of

\widehat\delta

with itself. Repeated function composition forms a

monoid In abstract algebra, a monoid is a set equipped with an associative binary operation and an identity element. For example, the nonnegative integers with addition form a monoid, the identity element being . Monoids are semigroups with identity ...

. For the transition functions, this monoid is known as the transition monoid, or sometimes the ''transformation semigroup''. The construction can also be reversed: given a

\widehat\delta

, one can reconstruct a

\delta

, and so the two descriptions are equivalent.

Advantages and disadvantages

DFAs are one of the most practical models of computation, since there is a trivial linear time, constant-space, online algorithm to simulate a DFA on a stream of input. Also, there are efficient algorithms to find a DFA recognizing: * the complement of the language recognized by a given DFA. * the union/intersection of the languages recognized by two given DFAs. Because DFAs can be reduced to a ''canonical form'' ( minimal DFAs), there are also efficient algorithms to determine: * whether a DFA accepts any strings (Emptiness Problem) * whether a DFA accepts all strings (Universality Problem) * whether two DFAs recognize the same language (Equality Problem) * whether the language recognized by a DFA is included in the language recognized by a second DFA (Inclusion Problem) * the DFA with a minimum number of states for a particular regular language (Minimization Problem) DFAs are equivalent in computing power to nondeterministic finite automata (NFAs). This is because, firstly any DFA is also an NFA, so an NFA can do what a DFA can do. Also, given an NFA, using the powerset construction one can build a DFA that recognizes the same language as the NFA, although the DFA could have exponentially larger number of states than the NFA. However, even though NFAs are computationally equivalent to DFAs, the above-mentioned problems are not necessarily solved efficiently also for NFAs. The non-universality problem for NFAs is PSPACE complete since there are small NFAs with shortest rejecting word in exponential size. A DFA is universal if and only if all states are final states, but this does not hold for NFAs. The Equality, Inclusion and Minimization Problems are also PSPACE complete since they require forming the complement of an NFA which results in an exponential blow up of size. On the other hand, finite-state automata are of strictly limited power in the languages they can recognize; many simple languages, including any problem that requires more than constant space to solve, cannot be recognized by a DFA. The classic example of a simply described language that no DFA can recognize is bracket or Dyck language, i.e., the language that consists of properly paired brackets such as word "(()())". Intuitively, no DFA can recognize the Dyck language because DFAs are not capable of counting: a DFA-like automaton needs to have a state to represent any possible number of "currently open" parentheses, meaning it would need an unbounded number of states. Another simpler example is the language consisting of strings of the form ''aⁿbⁿ'' for some finite but arbitrary number of ''a''s, followed by an equal number of ''b''s.

DFA identification from labeled words

Given a set of ''positive'' words

S^+ \subset \Sigma^*

and a set of ''negative'' words

S^- \subset \Sigma^*

one can construct a DFA that accepts all words from

S^+

and rejects all words from

S^-

: this problem is called ''DFA identification'' (synthesis, learning). While ''some'' DFA can be constructed in linear time, the problem of identifying a DFA with the minimal number of states is NP-complete. The first algorithm for minimal DFA identification has been proposed by Trakhtenbrot and Barzdin and is called the ''TB-algorithm''. However, the TB-algorithm assumes that all words from

\Sigma

up to a given length are contained in either

S^+ \cup S^-

. Later, K. Lang proposed an extension of the TB-algorithm that does not use any assumptions about

S^+

and

S^-

, the ''Traxbar'' algorithm. However, Traxbar does not guarantee the minimality of the constructed DFA. In his work E.M. Gold also proposed a heuristic algorithm for minimal DFA identification. Gold's algorithm assumes that

S^+

and

S^-

contain a '' characteristic set'' of the regular language; otherwise, the constructed DFA will be inconsistent either with

S^+

S^-

. Other notable DFA identification algorithms include the RPNI algorithm, the Blue-Fringe evidence-driven state-merging algorithm, and Windowed-EDSM. Another research direction is the application of

evolutionary algorithm Evolutionary algorithms (EA) reproduce essential elements of the biological evolution in a computer algorithm in order to solve "difficult" problems, at least Approximation, approximately, for which no exact or satisfactory solution methods are k ...

s: the smart state labeling evolutionary algorithm allowed to solve a modified DFA identification problem in which the training data (sets

S^+

and

S^-

) is ''noisy'' in the sense that some words are attributed to wrong classes. Yet another step forward is due to application of

SAT The SAT ( ) is a standardized test widely used for college admissions in the United States. Since its debut in 1926, its name and Test score, scoring have changed several times. For much of its history, it was called the Scholastic Aptitude Test ...

solvers by Marjin J. H. Heule and S. Verwer: the minimal DFA identification problem is reduced to deciding the satisfiability of a Boolean formula. The main idea is to build an augmented prefix-tree acceptor (a

trie In computer science, a trie (, ), also known as a digital tree or prefix tree, is a specialized search tree data structure used to store and retrieve strings from a dictionary or set. Unlike a binary search tree, nodes in a trie do not store t ...

containing all input words with corresponding labels) based on the input sets and reduce the problem of finding a DFA with

C

states to ''coloring'' the tree vertices with

C

states in such a way that when vertices with one color are merged to one state, the generated automaton is deterministic and complies with

S^+

and

S^-

. Though this approach allows finding the minimal DFA, it suffers from exponential blow-up of execution time when the size of input data increases. Therefore, Heule and Verwer's initial algorithm has later been augmented with making several steps of the EDSM algorithm prior to SAT solver execution: the DFASAT algorithm. This allows reducing the search space of the problem, but leads to loss of the minimality guarantee. Another way of reducing the search space has been proposed by Ulyantsev et al. by means of new symmetry breaking predicates based on the

breadth-first search Breadth-first search (BFS) is an algorithm for searching a tree data structure for a node that satisfies a given property. It starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next dept ...

algorithm: the sought DFA's states are constrained to be numbered according to the BFS algorithm launched from the initial state. This approach reduces the search space by

C!

by eliminating isomorphic automata.

Equivalent models

Read-only right-moving Turing machines

Read-only right-moving Turing machines are a particular type of

Turing machine A Turing machine is a mathematical model of computation describing an abstract machine that manipulates symbols on a strip of tape according to a table of rules. Despite the model's simplicity, it is capable of implementing any computer algori ...

that only moves right; these are almost exactly equivalent to DFAs. The definition based on a singly infinite tape is a 7-

M = \langle Q, \Gamma, b, \Sigma, \delta, q_0, F \rangle,

where :

Q

is a finite set of ''states''; :

\Gamma

is a finite set of the ''tape alphabet/symbols''; :

b \in \Gamma

is the ''blank symbol'' (the only symbol allowed to occur on the tape infinitely often at any step during the computation); :

\Sigma

, a subset of

\Gamma

not including ''b'', is the set of ''input symbols''; :

\delta: Q \times \Gamma \to Q \times \Gamma \times \

is a function called the '' transition function'', ''R'' is a right movement (a right shift); :

q_0 \in Q

is the ''initial state''; :

F \subseteq Q

is the set of ''final'' or ''accepting states''. The machine always accepts a regular language. There must exist at least one element of the set (a HALT state) for the language to be nonempty.

Example of a 3-state, 2-symbol read-only Turing machine

Q = \;

\Gamma = \;

b = 0

, "blank"; :

\Sigma = \varnothing

, empty set; :

\delta =

see state-table above; :

q_0 = A

, initial state; :

F =

the one element set of final states:

\

Notes

References

* * * * * *

Formal definition

Example

Variations

Complete and incomplete

Local automata

Randomness

Closure properties

As a transition monoid

Advantages and disadvantages

DFA identification from labeled words

Equivalent models

Read-only right-moving Turing machines

Example of a 3-state, 2-symbol read-only Turing machine

See also

Notes

References

Further reading