Chomsky–Schützenberger Representation Theorem
   HOME

TheInfoList



OR:

In
formal language theory In logic, mathematics, computer science, and linguistics, a formal language is a set of string (computer science), strings whose symbols are taken from a set called "#Definition, alphabet". The alphabet of a formal language consists of symbol ...
, the Chomsky–Schützenberger representation theorem is a theorem derived by
Noam Chomsky Avram Noam Chomsky (born December 7, 1928) is an American professor and public intellectual known for his work in linguistics, political activism, and social criticism. Sometimes called "the father of modern linguistics", Chomsky is also a ...
and
Marcel-Paul Schützenberger Marcel-Paul "Marco" Schützenberger (24 October 1920 – 29 July 1996) was a French mathematician and Doctor of Medicine. He worked in the fields of formal language, combinatorics, and information theory.Herbert Wilf, Dominique Foata, ''et al.'', ...
in 1959 about representing a given
context-free language In formal language theory, a context-free language (CFL), also called a Chomsky type-2 language, is a language generated by a context-free grammar (CFG). Context-free languages have many applications in programming languages, in particular, mos ...
in terms of two simpler languages. These two simpler languages, namely a
regular language In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...
and a
Dyck language In the theory of formal languages of computer science, mathematics, and linguistics, a Dyck word is a balanced string of brackets. The set of Dyck words forms a Dyck language. The simplest, Dyck-1, uses just two matching brackets, e.g. ( and ). ...
, are combined by means of an
intersection In mathematics, the intersection of two or more objects is another object consisting of everything that is contained in all of the objects simultaneously. For example, in Euclidean geometry, when two lines in a plane are not parallel, their ...
and a
homomorphism In algebra, a homomorphism is a morphism, structure-preserving map (mathematics), map between two algebraic structures of the same type (such as two group (mathematics), groups, two ring (mathematics), rings, or two vector spaces). The word ''homo ...
. The theorem Proofs of this theorem are found in several textbooks, e.g. or .


Mathematics


Notation

A few notions from formal language theory are in order. A context-free language is '' regular'', if it can be described by a
regular expression A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
, or, equivalently, if it is accepted by a finite automaton. A homomorphism is based on a function h which maps symbols from an alphabet \Gamma to words over another alphabet \Sigma; If the domain of this function is extended to words over \Gamma in the natural way, by letting h(xy)=h(x)h(y) for all words x and y, this yields a ''
homomorphism In algebra, a homomorphism is a morphism, structure-preserving map (mathematics), map between two algebraic structures of the same type (such as two group (mathematics), groups, two ring (mathematics), rings, or two vector spaces). The word ''homo ...
'' h:\Gamma^*\to \Sigma^*. A ''matched alphabet'' T \cup \overline T is an alphabet with two equal-sized sets; it is convenient to think of it as a set of parentheses types, where T contains the opening parenthesis symbols, whereas the symbols in \overline T contains the closing parenthesis symbols. For a matched alphabet T \cup \overline T, the '' typed Dyck language'' D_T is given by :D_T = \. For example, the following is a valid sentence in the 3-typed Dyck language:
( [ ">[_.html" ;"title="[ ">[ ( ) )


Theorem

A language ''L'' over the alphabet \Sigma is context-free if and only if there exists :* a matched alphabet T \cup \overline T :* a regular language R over T \cup \overline T, :* and a homomorphism h : (T \cup \overline T)^* \to \Sigma^* :such that L = h(D_T \cap R). We can interpret this as saying that any CFG language can be generated by first generating a typed Dyck language, filtering it by a regular grammar, and finally converting each bracket into a word in the CFG language.


References

* * {{DEFAULTSORT:Chomsky-Schutzenberger representation theorem Noam Chomsky Formal languages Theorems in discrete mathematics