The star height problem in
formal language theory
In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.
The alphabet of a formal language consists of sy ...
is the question whether all
regular language
In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...
s can be expressed using
regular expression
A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
s of limited
star height In theoretical computer science, more precisely in the theory of formal languages, the star height is a measure for the structural complexity
of regular expressions and regular languages. The star height of a regular ''expression'' equals the maxim ...
, i.e. with a limited nesting depth of
Kleene star
In mathematical logic and computer science, the Kleene star (or Kleene operator or Kleene closure) is a unary operation, either on sets of strings or on sets of symbols or characters. In mathematics,
it is more commonly known as the free monoid ...
s. Specifically, is a nesting depth of one always sufficient? If not, is there an
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
to determine how many are required? The problem was raised by .
Families of regular languages with unbounded star height
The first question was answered in the negative when in 1963, Eggan gave examples of regular languages of
star height In theoretical computer science, more precisely in the theory of formal languages, the star height is a measure for the structural complexity
of regular expressions and regular languages. The star height of a regular ''expression'' equals the maxim ...
''n'' for every ''n''. Here, the star height ''h''(''L'') of a regular language ''L'' is defined as the minimum star height among all regular expressions representing ''L''. The first few languages found by are described in the following, by means of giving a regular expression for each language:
:
The construction principle for these expressions is that expression
is obtained by concatenating two copies of
, appropriately renaming the letters of the second copy using fresh alphabet symbols, concatenating the result with another fresh alphabet symbol, and then by surrounding the resulting expression with a Kleene star. The remaining, more difficult part, is to prove that for
there is no equivalent regular expression of star height less than ''n''; a proof is given in .
However, Eggan's examples use a large
alphabet
An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a s ...
, of size 2
''n''-1 for the language with star height ''n''. He thus asked whether we can also find examples over binary alphabets. This was proved to be true shortly afterwards by .
Their examples can be described by an
inductively defined family of regular expressions over the binary alphabet
as follows–cf. :
:
Again, a rigorous proof is needed for the fact that
does not admit an equivalent regular expression of lower star height. Proofs are given by and by .
Computing the star height of regular languages
In contrast, the second question turned out to be much more difficult, and the question became a famous open problem in formal language theory for over two decades . For years, there was only little progress. The
pure-group language
In automata theory, a permutation automaton, or pure-group automaton, is a deterministic finite automaton such that each input symbol permutes the set of states.
Formally, a deterministic finite automaton may be defined by the tuple (''Q'', Σ ...
s were the first interesting family of regular languages for which the star height problem was proved to be
decidable . But the general problem remained open for more than 25 years until it was settled by
Hashiguchi, who in 1988 published an algorithm to determine the
star height In theoretical computer science, more precisely in the theory of formal languages, the star height is a measure for the structural complexity
of regular expressions and regular languages. The star height of a regular ''expression'' equals the maxim ...
of any regular language. The algorithm wasn't at all practical, being of non-
elementary
Elementary may refer to:
Arts, entertainment, and media Music
* ''Elementary'' (Cindy Morgan album), 2001
* ''Elementary'' (The End album), 2007
* ''Elementary'', a Melvin "Wah-Wah Watson" Ragin album, 1977
Other uses in arts, entertainment, a ...
complexity. To illustrate the immense resource consumptions of that algorithm, Lombardy and Sakarovitch (2002) give some actual numbers:
Notice that alone the number
has 10 billion zeros when written down in
decimal notation, and is already ''by far'' larger than the
number of atoms in the observable universe.
A much more efficient algorithm than Hashiguchi's procedure was devised by Kirsten in 2005. This algorithm runs, for a given
nondeterministic finite automaton
In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if
* each of its transitions is ''uniquely'' determined by its source state and input symbol, and
* reading an input symbol is required for each state t ...
as input, within double-
exponential space. Yet the resource requirements of this algorithm still greatly exceed the margins of what is considered practically feasible.
This algorithm has been optimized and generalized to trees by Colcombet and Löding in 2008 , as part of the theory of regular cost functions.
It has been implemented in 2017 in the tool suite Stamina.
[Nathanaël Fijalkow, Hugo Gimbert, Edon Kelmendi, Denis Kuperberg:]
Stamina: Stabilisation Monoids in Automata Theory
. CIAA 2017: 101-112 Tool available at https://github.com/nathanael-fijalkow/stamina/
See also
*
Generalized star height problem
*
Kleene's algorithm In theoretical computer science, in particular in formal language theory, Kleene's algorithm transforms a given nondeterministic finite automaton (NFA) into a regular expression.
Together with other conversion algorithms, it establishes the equival ...
— computes a regular expression (usually of non-minimal star height) for a language given by a
deterministic finite automaton
In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automa ...
References
Works cited
(technical report version)*
*
*
*
*
Further reading
*
*
*
*
* {{cite book , last=Sakarovitch , first=Jacques , title=Elements of automata theory , others=Translated from the French by Reuben Thomas , location=Cambridge , publisher=
Cambridge University Press
Cambridge University Press is the university press of the University of Cambridge. Granted letters patent by Henry VIII of England, King Henry VIII in 1534, it is the oldest university press in the world. It is also the King's Printer.
Cambr ...
, year=2009 , isbn=978-0-521-84425-3 , zbl=1188.68177
Automata (computation)
Formal languages
Theorems in discrete mathematics