HOME

TheInfoList



OR:

In
theoretical computer science Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory. It is difficult to circumsc ...
and
formal language theory In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of sy ...
, a regular language (also called a rational language) is a
formal language In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of s ...
that can be defined by a
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
, in the strict sense in theoretical computer science (as opposed to many modern regular expressions engines, which are augmented with features that allow recognition of non-regular languages). Alternatively, a regular language can be defined as a language recognized by a finite automaton. The equivalence of regular expressions and finite automata is known as Kleene's theorem (after American mathematician Stephen Cole Kleene). In the
Chomsky hierarchy In formal language theory, computer science and linguistics, the Chomsky hierarchy (also referred to as the Chomsky–Schützenberger hierarchy) is a containment hierarchy of classes of formal grammars. This hierarchy of grammars was described ...
, regular languages are the languages generated by Type-3 grammars.


Formal definition

The collection of regular languages over an
alphabet An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a s ...
Σ is defined recursively as follows: * The empty language Ø is a regular language. * For each ''a'' ∈ Σ (''a'' belongs to Σ), the singleton language is a regular language. * If ''A'' is a regular language, ''A''* (
Kleene star In mathematical logic and computer science, the Kleene star (or Kleene operator or Kleene closure) is a unary operation, either on sets of strings or on sets of symbols or characters. In mathematics, it is more commonly known as the free monoid ...
) is a regular language. Due to this, the empty string language is also regular. * If ''A'' and ''B'' are regular languages, then ''A'' ∪ ''B'' (union) and ''A'' • ''B'' (concatenation) are regular languages. * No other languages over Σ are regular. See
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
for syntax and semantics of regular expressions.


Examples

All finite languages are regular; in particular the empty string language = Ø* is regular. Other typical examples include the language consisting of all strings over the alphabet which contain an even number of ''a''s, or the language consisting of all strings of the form: several ''a''s followed by several ''b''s. A simple example of a language that is not regular is the set of strings . Intuitively, it cannot be recognized with a finite automaton, since a finite automaton has finite memory and it cannot remember the exact number of a's. Techniques to prove this fact rigorously are given
below Below may refer to: *Earth * Ground (disambiguation) * Soil * Floor * Bottom (disambiguation) * Less than *Temperatures below freezing * Hell or underworld People with the surname * Ernst von Below (1863–1955), German World War I general * Fr ...
.


Equivalent formalisms

A regular language satisfies the following equivalent properties: # it is the language of a regular expression (by the above definition) # it is the language accepted by a
nondeterministic finite automaton In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if * each of its transitions is ''uniquely'' determined by its source state and input symbol, and * reading an input symbol is required for each state t ...
(NFA)1. ⇒ 2. by
Thompson's construction algorithm In computer science, Thompson's construction algorithm, also called the McNaughton–Yamada–Thompson algorithm, is a method of transforming a regular expression into an equivalent nondeterministic finite automaton (NFA). This NFA can be used ...
2. ⇒ 1. by Kleene's algorithm or using Arden's lemma # it is the language accepted by a
deterministic finite automaton In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automa ...
(DFA)2. ⇒ 3. by the
powerset construction In the theory of computation and automata theory, the powerset construction or subset construction is a standard method for converting a nondeterministic finite automaton (NFA) into a deterministic finite automaton (DFA) which recognizes the sa ...
3. ⇒ 2. since the former
definition A definition is a statement of the meaning of a term (a word, phrase, or other set of symbols). Definitions can be classified into two large categories: intensional definitions (which try to give the sense of a term), and extensional definitio ...
is stronger than the
latter {{Short pages monitor For regular expressions, the universality problem is
NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by tryin ...
already for a singleton alphabet. For larger alphabets, that problem is PSPACE-complete. If regular expressions are extended to allow also a ''squaring operator'', with "''A''2" denoting the same as "''AA''", still just regular languages can be described, but the universality problem has an exponential space lower bound, and is in fact complete for exponential space with respect to polynomial-time reduction. For a fixed finite alphabet, the theory of the set of all languages — together with strings, membership of a string in a language, and for each character, a function to append the character to a string (and no other operations) — is decidable, and its minimal elementary substructure consists precisely of regular languages. For a binary alphabet, the theory is called S2S.


Complexity results

In
computational complexity theory In theoretical computer science and mathematics, computational complexity theory focuses on classifying computational problems according to their resource usage, and relating these classes to each other. A computational problem is a task solved ...
, the
complexity class In computational complexity theory, a complexity class is a set of computational problems of related resource-based complexity. The two most commonly analyzed resources are time and memory. In general, a complexity class is defined in terms ...
of all regular languages is sometimes referred to as REGULAR or REG and equals DSPACE(O(1)), the decision problems that can be solved in constant space (the space used is independent of the input size). REGULAR ≠ AC0, since it (trivially) contains the parity problem of determining whether the number of 1 bits in the input is even or odd and this problem is not in AC0. On the other hand, REGULAR does not contain AC0, because the nonregular language of
palindrome A palindrome is a word, number, phrase, or other sequence of symbols that reads the same backwards as forwards, such as the words ''madam'' or ''racecar'', the date and time ''11/11/11 11:11,'' and the sentence: "A man, a plan, a canal – Panam ...
s, or the nonregular language \ can both be recognized in AC0. If a language is ''not'' regular, it requires a machine with at least Ω(log log ''n'') space to recognize (where ''n'' is the input size). In other words, DSPACE( o(log log ''n'')) equals the class of regular languages. In practice, most nonregular problems are solved by machines taking at least logarithmic space.


Location in the Chomsky hierarchy

To locate the regular languages in the
Chomsky hierarchy In formal language theory, computer science and linguistics, the Chomsky hierarchy (also referred to as the Chomsky–Schützenberger hierarchy) is a containment hierarchy of classes of formal grammars. This hierarchy of grammars was described ...
, one notices that every regular language is context-free. The converse is not true: for example the language consisting of all strings having the same number of ''a'''s as ''b'''s is context-free but not regular. To prove that a language is not regular, one often uses the Myhill–Nerode theorem and the pumping lemma. Other approaches include using the
closure properties Closure may refer to: Conceptual Psychology * Closure (psychology), the state of experiencing an emotional conclusion to a difficult life event Computer science * Closure (computer programming), an abstraction binding a function to its scope * ...
of regular languages or quantifying Kolmogorov complexity. Important subclasses of regular languages include * Finite languages, those containing only a finite number of words. These are regular languages, as one can create a
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
that is the union of every word in the language. * Star-free languages, those that can be described by a regular expression constructed from the empty symbol, letters, concatenation and all boolean operators (see
algebra of sets In mathematics, the algebra of sets, not to be confused with the mathematical structure of ''an'' algebra of sets, defines the properties and laws of sets, the set-theoretic operations of union, intersection, and complementation and the r ...
) including complementation but not the
Kleene star In mathematical logic and computer science, the Kleene star (or Kleene operator or Kleene closure) is a unary operation, either on sets of strings or on sets of symbols or characters. In mathematics, it is more commonly known as the free monoid ...
: this class includes all finite languages.


The number of words in a regular language

Let s_L(n) denote the number of words of length n in L. The
ordinary generating function In mathematics, a generating function is a way of encoding an infinite sequence of numbers () by treating them as the coefficients of a formal power series. This series is called the generating function of the sequence. Unlike an ordinary ser ...
for ''L'' is the formal power series :S_L(z) = \sum_ s_L(n) z^n \ . The generating function of a language ''L'' is a
rational function In mathematics, a rational function is any function that can be defined by a rational fraction, which is an algebraic fraction such that both the numerator and the denominator are polynomials. The coefficients of the polynomials need not be ...
if ''L'' is regular. Hence for every regular language L the sequence s_L(n)_ is constant-recursive; that is, there exist an integer constant n_0, complex constants \lambda_1,\,\ldots,\,\lambda_k and complex polynomials p_1(x),\,\ldots,\,p_k(x) such that for every n \geq n_0 the number s_L(n) of words of length n in L is s_L(n)=p_1(n)\lambda_1^n+\dotsb+p_k(n)\lambda_k^n. Thus, non-regularity of certain languages L' can be proved by counting the words of a given length in L'. Consider, for example, the Dyck language of strings of balanced parentheses. The number of words of length 2n in the Dyck language is equal to the
Catalan number In combinatorial mathematics, the Catalan numbers are a sequence of natural numbers that occur in various counting problems, often involving recursively defined objects. They are named after the French-Belgian mathematician Eugène Charles C ...
C_n\sim\frac, which is not of the form p(n)\lambda^n, witnessing the non-regularity of the Dyck language. Care must be taken since some of the eigenvalues \lambda_i could have the same magnitude. For example, the number of words of length n in the language of all even binary words is not of the form p(n)\lambda^n, but the number of words of even or odd length are of this form; the corresponding eigenvalues are 2,-2. In general, for every regular language there exists a constant d such that for all a, the number of words of length dm+a is asymptotically C_a m^ \lambda_a^m. The ''zeta function'' of a language ''L'' is :\zeta_L(z) = \exp \left(\right) \ . The zeta function of a regular language is not in general rational, but that of an arbitrary cyclic language is.


Generalizations

The notion of a regular language has been generalized to infinite words (see ω-automata) and to trees (see tree automaton). Rational set generalizes the notion (of regular/rational language) to monoids that are not necessarily
free Free may refer to: Concept * Freedom, having the ability to do something, without having to obey anyone/anything * Freethought, a position that beliefs should be formed only on the basis of logic, reason, and empiricism * Emancipate, to procur ...
. Likewise, the notion of a recognizable language (by a finite automaton) has namesake as recognizable set over a monoid that is not necessarily free. Howard Straubing notes in relation to these facts that “The term "regular language" is a bit unfortunate. Papers influenced by
Eilenberg Eilenberg is a surname, and may refer to: * Samuel Eilenberg (1913–1998), Polish mathematician * Richard Eilenberg (1848–1927), German composer Named after Samuel * Eilenberg–MacLane space * Eilenberg–Moore algebra * Eilenberg–Steenro ...
's monograph in two volumes "A" (1974, ) and "B" (1976, ), the latter with two chapters by Bret Tilson. often use either the term "recognizable language", which refers to the behavior of automata, or "rational language", which refers to important analogies between regular expressions and rational power series. (In fact, Eilenberg defines rational and recognizable subsets of arbitrary monoids; the two notions do not, in general, coincide.) This terminology, while better motivated, never really caught on, and "regular language" is used almost universally.” Rational series is another generalization, this time in the context of a formal power series over a semiring. This approach gives rise to
weighted rational expression A weight function is a mathematical device used when performing a sum, integral, or average to give some elements more "weight" or influence on the result than other elements in the same set. The result of this application of a weight function is ...
s and weighted automata. In this algebraic context, the regular languages (corresponding to
Boolean Any kind of logic, function, expression, or theory based on the work of George Boole is considered Boolean. Related to this, "Boolean" may refer to: * Boolean data type, a form of data with only two possible values (usually "true" and "false" ...
-weighted rational expressions) are usually called ''rational languages''. Also in this context, Kleene's theorem finds a generalization called the Kleene-Schützenberger theorem.


Learning from examples


Notes


References

* * * * Chapter 1: Regular Languages, pp. 31–90. Subsection "Decidable Problems Concerning Regular Languages" of section 4.1: Decidable Languages, pp. 152–155. * Philippe Flajolet and Robert Sedgewick, ''
Analytic Combinatorics In combinatorics, the symbolic method is a technique for counting combinatorial objects. It uses the internal structure of the objects to derive formulas for their generating functions. The method is mostly associated with Philippe Flajolet an ...
'': Symbolic Combinatorics. Online book, 2002. * *


Further reading

* Kleene, S.C.: Representation of events in nerve nets and finite automata. In: Shannon, C.E., McCarthy, J. (eds.) Automata Studies, pp. 3–41. Princeton University Press, Princeton (1956); it is a slightly modified version of his 1951
RAND Corporation The RAND Corporation (from the phrase "research and development") is an American nonprofit global policy think tank created in 1948 by Douglas Aircraft Company to offer research and analysis to the United States Armed Forces. It is financ ...
report of the same title
RM704
*


External links

* * {{Formal languages and grammars Formal languages Finite automata