In
theoretical computer science
Theoretical computer science is a subfield of computer science and mathematics that focuses on the Abstraction, abstract and mathematical foundations of computation.
It is difficult to circumscribe the theoretical areas precisely. The Associati ...
, in particular in
formal language theory
In logic, mathematics, computer science, and linguistics, a formal language is a set of string (computer science), strings whose symbols are taken from a set called "#Definition, alphabet".
The alphabet of a formal language consists of symbol ...
, Kleene's algorithm transforms a given
nondeterministic finite automaton
In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if
* each of its transitions is ''uniquely'' determined by its source state and input symbol, and
* reading an input symbol is required for each state tr ...
(NFA) into a
regular expression
A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
.
Together with other conversion algorithms, it establishes the equivalence of several description formats for
regular language
In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...
s. Alternative presentations of the same method include the "elimination method" attributed to
Brzozowski and
McCluskey, the algorithm of
McNaughton and
Yamada, and the use of
Arden's lemma.
Algorithm description
According to Gross and Yellen (2004),
[ Here: sect.2.1, remark R13 on p.65] the algorithm can be traced back to
Kleene
Stephen Cole Kleene ( ; January 5, 1909 – January 25, 1994) was an American mathematician. One of the students of Alonzo Church, Kleene, along with Rózsa Péter, Alan Turing, Emil Post, and others, is best known as a founder of the branch of ...
(1956). A presentation of the algorithm in the case of
deterministic finite automata (DFAs) is given in Hopcroft and Ullman (1979). The presentation of the algorithm for NFAs below follows Gross and Yellen (2004).
Given a
nondeterministic finite automaton
In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if
* each of its transitions is ''uniquely'' determined by its source state and input symbol, and
* reading an input symbol is required for each state tr ...
''M'' = (''Q'', Σ, δ, ''q''
0, ''F''), with ''Q'' = its set of
states
State most commonly refers to:
* State (polity), a centralized political organization that regulates law and society within a territory
**Sovereign state, a sovereign polity in international law, commonly referred to as a country
**Nation state, a ...
, the algorithm computes
:the sets ''R'' of all strings that take ''M'' from state ''q''
''i'' to ''q''
''j'' without going through any state numbered higher than ''k''.
Here, "going through a state" means entering ''and'' leaving it, so both ''i'' and ''j'' may be higher than ''k'', but no intermediate state may.
Each set ''R'' is represented by a regular expression; the algorithm computes them step by step for ''k'' = -1, 0, ..., ''n''. Since there is no state numbered higher than ''n'', the regular expression ''R'' represents the set of all strings that take ''M'' from its
start state ''q''
0 to ''q''
''j''. If ''F'' = is the set of
accept states, the
regular expression
A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
''R'' , ... , ''R'' represents the language
accepted by ''M''.
The initial regular expressions, for ''k'' = -1, are computed as follows for ''i''≠''j'':
:''R'' = ''a''
1 , ... , ''a''
''m'' where ''q''
''j'' ∈ δ(''q''
''i'',''a''
1), ..., ''q''
''j'' ∈ δ(''q''
''i'',''a''
''m'')
and as follows for ''i''=''j'':
:''R'' = ''a''
1 , ... , ''a''
''m'' , ε where ''q''
''i'' ∈ δ(''q''
''i'',''a''
1), ..., ''q''
''i'' ∈ δ(''q''
''i'',''a''
''m'')
In other words, ''R'' mentions all letters that label a transition from ''i'' to ''j'', and we also include ε in the case where ''i''=''j''.
After that, in each step the expressions ''R'' are computed from the previous ones by
:''R'' = ''R'' (''R'')
* ''R'' , ''R''
Another way to understand the operation of the algorithm is as an "elimination method", where the states from 0 to ''n'' are successively removed: when state ''k'' is removed, the regular expression ''R'', which describes the words that label a path from state ''i''>''k'' to state ''j''>''k'', is rewritten into ''R'' so as to take into account the possibility of going via the "eliminated" state ''k''.
By induction on ''k'', it can be shown that the length of each expression ''R'' is at most (4
''k''+1(6''s''+7) - 4) symbols, where ''s'' denotes the number of characters in Σ.
Therefore, the length of the regular expression representing the language accepted by ''M'' is at most (4
''n''+1(6''s''+7)''f'' - ''f'' - 3) symbols, where ''f'' denotes the number of final states.
This exponential blowup is inevitable, because there exist families of DFAs for which any equivalent regular expression must be of exponential size.
[. Theorem 16.]
In practice, the size of the regular expression obtained by running the algorithm can be very different depending on the order in which the states are considered by the procedure, i.e., the order in which they are numbered from 0 to ''n''.
Example

The automaton shown in the picture can be described as ''M'' = (''Q'', Σ, δ, ''q''
0, ''F'') with
* the set of states ''Q'' = ,
* the input alphabet Σ = ,
* the transition function δ with δ(''q''
0,''a'')=''q''
0, δ(''q''
0,''b'')=''q''
1, δ(''q''
1,''a'')=''q''
2, δ(''q''
1,''b'')=''q''
1, δ(''q''
2,''a'')=''q''
1, and δ(''q''
2,''b'')=''q''
1,
* the start state ''q''
0, and
* set of accept states ''F'' = .
Kleene's algorithm computes the initial regular expressions as
:
After that, the ''R'' are computed from the ''R'' step by step for ''k'' = 0, 1, 2.
Kleene algebra equalities are used to simplify the regular expressions as much as possible.
; Step 0
:
; Step 1
:
; Step 2
:
Since ''q''
0 is the start state and ''q''
1 is the only accept state, the regular expression ''R'' denotes the set of all strings accepted by the automaton.
See also
*
Floyd–Warshall algorithm — an algorithm on weighted graphs that can be implemented by Kleene's algorithm using a particular
Kleene algebra
*
Star height problem — what is the minimum stars' nesting depth of all regular expressions corresponding to a given DFA?
*
Generalized star height problem — if a complement operator is allowed additionally in regular expressions, can the
stars' nesting depth of Kleene's algorithm's output be limited to a fixed bound?
*
Thompson's construction algorithm — transforms a regular expression to a finite automaton
References
{{reflist
Algorithms
Finite-state machines
Regular expressions