theoretical computer science Theoretical computer science is a subfield of computer science and mathematics that focuses on the Abstraction, abstract and mathematical foundations of computation. It is difficult to circumscribe the theoretical areas precisely. The Associati ...

, in particular in

formal language theory In logic, mathematics, computer science, and linguistics, a formal language is a set of string (computer science), strings whose symbols are taken from a set called "#Definition, alphabet". The alphabet of a formal language consists of symbol ...

, Kleene's algorithm transforms a given

nondeterministic finite automaton In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if * each of its transitions is ''uniquely'' determined by its source state and input symbol, and * reading an input symbol is required for each state tr ...

(NFA) into a

regular expression A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...

. Together with other conversion algorithms, it establishes the equivalence of several description formats for

regular language In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...

s. Alternative presentations of the same method include the "elimination method" attributed to Brzozowski and McCluskey, the algorithm of McNaughton and Yamada, and the use of Arden's lemma.

Algorithm description

According to Gross and Yellen (2004), Here: sect.2.1, remark R13 on p.65 the algorithm can be traced back to

Kleene Stephen Cole Kleene ( ; January 5, 1909 – January 25, 1994) was an American mathematician. One of the students of Alonzo Church, Kleene, along with Rózsa Péter, Alan Turing, Emil Post, and others, is best known as a founder of the branch of ...

(1956). A presentation of the algorithm in the case of deterministic finite automata (DFAs) is given in Hopcroft and Ullman (1979). The presentation of the algorithm for NFAs below follows Gross and Yellen (2004). Given a

''M'' = (''Q'', Σ, δ, ''q''₀, ''F''), with ''Q'' = its set of

states State most commonly refers to: * State (polity), a centralized political organization that regulates law and society within a territory **Sovereign state, a sovereign polity in international law, commonly referred to as a country **Nation state, a ...

, the algorithm computes :the sets ''R'' of all strings that take ''M'' from state ''q''_''i'' to ''q''_''j'' without going through any state numbered higher than ''k''. Here, "going through a state" means entering ''and'' leaving it, so both ''i'' and ''j'' may be higher than ''k'', but no intermediate state may. Each set ''R'' is represented by a regular expression; the algorithm computes them step by step for ''k'' = -1, 0, ..., ''n''. Since there is no state numbered higher than ''n'', the regular expression ''R'' represents the set of all strings that take ''M'' from its start state ''q''₀ to ''q''_''j''. If ''F'' = is the set of accept states, the

''R'' , ... , ''R'' represents the language accepted by ''M''. The initial regular expressions, for ''k'' = -1, are computed as follows for ''i''≠''j'': :''R'' = ''a''₁ , ... , ''a''_''m'' where ''q''_''j'' ∈ δ(''q''_''i'',''a''₁), ..., ''q''_''j'' ∈ δ(''q''_''i'',''a''_''m'') and as follows for ''i''=''j'': :''R'' = ''a''₁ , ... , ''a''_''m'' , ε where ''q''_''i'' ∈ δ(''q''_''i'',''a''₁), ..., ''q''_''i'' ∈ δ(''q''_''i'',''a''_''m'') In other words, ''R'' mentions all letters that label a transition from ''i'' to ''j'', and we also include ε in the case where ''i''=''j''. After that, in each step the expressions ''R'' are computed from the previous ones by :''R'' = ''R'' (''R'')^* ''R'' , ''R'' Another way to understand the operation of the algorithm is as an "elimination method", where the states from 0 to ''n'' are successively removed: when state ''k'' is removed, the regular expression ''R'', which describes the words that label a path from state ''i''>''k'' to state ''j''>''k'', is rewritten into ''R'' so as to take into account the possibility of going via the "eliminated" state ''k''. By induction on ''k'', it can be shown that the length of each expression ''R'' is at most (4^''k''+1(6''s''+7) - 4) symbols, where ''s'' denotes the number of characters in Σ. Therefore, the length of the regular expression representing the language accepted by ''M'' is at most (4^''n''+1(6''s''+7)''f'' - ''f'' - 3) symbols, where ''f'' denotes the number of final states. This exponential blowup is inevitable, because there exist families of DFAs for which any equivalent regular expression must be of exponential size.. Theorem 16. In practice, the size of the regular expression obtained by running the algorithm can be very different depending on the order in which the states are considered by the procedure, i.e., the order in which they are numbered from 0 to ''n''.

Example

The automaton shown in the picture can be described as ''M'' = (''Q'', Σ, δ, ''q''₀, ''F'') with * the set of states ''Q'' = , * the input alphabet Σ = , * the transition function δ with δ(''q''₀,''a'')=''q''₀, δ(''q''₀,''b'')=''q''₁, δ(''q''₁,''a'')=''q''₂, δ(''q''₁,''b'')=''q''₁, δ(''q''₂,''a'')=''q''₁, and δ(''q''₂,''b'')=''q''₁, * the start state ''q''₀, and * set of accept states ''F'' = . Kleene's algorithm computes the initial regular expressions as : After that, the ''R'' are computed from the ''R'' step by step for ''k'' = 0, 1, 2. Kleene algebra equalities are used to simplify the regular expressions as much as possible. ; Step 0 : ; Step 1 : ; Step 2 : Since ''q''₀ is the start state and ''q''₁ is the only accept state, the regular expression ''R'' denotes the set of all strings accepted by the automaton.

References

{{reflist Algorithms Finite-state machines Regular expressions

Algorithm description

Example

See also

References