In
formal language theory
In logic, mathematics, computer science, and linguistics, a formal language is a set of string (computer science), strings whose symbols are taken from a set called "#Definition, alphabet".
The alphabet of a formal language consists of symbol ...
, a context-free language (CFL), also called a
Chomsky
Avram Noam Chomsky (born December 7, 1928) is an American professor and public intellectual known for his work in linguistics, political activism, and social criticism. Sometimes called "the father of modern linguistics", Chomsky is also a ...
type-2 language, is a
language
Language is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and signed language, signed forms, and may also be conveyed through writing syste ...
generated by a
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules
can be applied to a nonterminal symbol regardless of its context.
In particular, in a context-free grammar, each production rule is of the fo ...
(CFG).
Context-free languages have many applications in
programming languages
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their syntax (form) and semantics (meaning), usually defined by a formal language. Languages usually provide features ...
, in particular, most arithmetic expressions are generated by context-free grammars.
Background
Context-free grammar
Different context-free grammars can generate the same context-free language. Intrinsic properties of the language can be distinguished from extrinsic properties of a particular grammar by comparing multiple grammars that describe the language.
Automata
The set of all context-free languages is identical to the set of languages accepted by
pushdown automata
In the theory of computation, a branch of theoretical computer science, a pushdown automaton (PDA) is
a type of automaton that employs a stack.
Pushdown automata are used in theories about what can be computed by machines. They are more capab ...
, which makes these languages amenable to parsing. Further, for a given CFG, there is a direct way to produce a pushdown automaton for the grammar (and thereby the corresponding language), though going the other way (producing a grammar given an automaton) is not as direct.
Examples
An example context-free language is
, the language of all non-empty even-length strings, the entire first halves of which are 's, and the entire second halves of which are 's. is generated by the grammar
.
This language is not
regular.
It is accepted by the
pushdown automaton where
is defined as follows:
[meaning of 's arguments and results: ]
:
Unambiguous CFLs are a proper subset of all CFLs: there are
inherently ambiguous CFLs. An example of an inherently ambiguous CFL is the union of
with
. This set is context-free, since the union of two context-free languages is always context-free. But there is no way to unambiguously parse strings in the (non-context-free) subset
which is the intersection of these two languages.
Dyck language
The
language of all properly matched parentheses is generated by the grammar
.
Properties
Context-free parsing
The context-free nature of the language makes it simple to parse with a pushdown automaton.
Determining an instance of the
membership problem; i.e. given a string
, determine whether
where
is the language generated by a given grammar
; is also known as ''recognition''. Context-free recognition for
Chomsky normal form grammars was shown by
Leslie G. Valiant to be reducible to Boolean
matrix multiplication
In mathematics, specifically in linear algebra, matrix multiplication is a binary operation that produces a matrix (mathematics), matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the n ...
, thus inheriting its complexity upper bound of
''O''(''n''
2.3728596).
[In Valiant's paper, ''O''(''n''2.81) was the then-best known upper bound. See Matrix multiplication#Computational complexity for bound improvements since then.]
Conversely,
Lillian Lee has shown ''O''(''n''
3−ε) Boolean matrix multiplication to be reducible to ''O''(''n''
3−3ε) CFG parsing, thus establishing some kind of lower bound for the latter.
Practical uses of context-free languages require also to produce a derivation tree that exhibits the structure that the grammar associates with the given string. The process of producing this tree is called ''
parsing
Parsing, syntax analysis, or syntactic analysis is a process of analyzing a String (computer science), string of Symbol (formal), symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal gramm ...
''. Known parsers have a time complexity that is cubic in the size of the string that is parsed.
Formally, the set of all context-free languages is identical to the set of languages accepted by pushdown automata (PDA). Parser algorithms for context-free languages include the
CYK algorithm and
Earley's Algorithm.
A special subclass of context-free languages are the
deterministic context-free language
In formal language theory, deterministic context-free languages (DCFL) are a proper subset of context-free languages. They are context-free languages that can be accepted by a deterministic pushdown automaton. DCFLs are always unambiguous, meanin ...
s which are defined as the set of languages accepted by a
deterministic pushdown automaton and can be parsed by a
LR(k) parser.
See also
parsing expression grammar
In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 20 ...
as an alternative approach to grammar and parser.
Closure properties
The class of context-free languages is
closed under the following operations. That is, if ''L'' and ''P'' are context-free languages, the following languages are context-free as well:
*the
union of ''L'' and ''P''
*the reversal of ''L''
*the
concatenation
In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalizations of concatenati ...
of ''L'' and ''P''
*the
Kleene star
In mathematical logic and theoretical computer science, the Kleene star (or Kleene operator or Kleene closure) is a unary operation on a Set (mathematics), set to generate a set of all finite-length strings that are composed of zero or more repe ...
of ''L''
*the image
of ''L'' under a
homomorphism
In algebra, a homomorphism is a morphism, structure-preserving map (mathematics), map between two algebraic structures of the same type (such as two group (mathematics), groups, two ring (mathematics), rings, or two vector spaces). The word ''homo ...
*the image
of ''L'' under an
inverse homomorphism
*the
circular shift
In combinatorial mathematics, a circular shift is the operation of rearranging the entries in a tuple, either by moving the final entry to the first position, while shifting all other entries to the next position, or by performing the inverse ope ...
of ''L'' (the language
)
*the prefix closure of ''L'' (the set of all
prefix
A prefix is an affix which is placed before the stem of a word. Particularly in the study of languages, a prefix is also called a preformative, because it alters the form of the word to which it is affixed.
Prefixes, like other affixes, can b ...
es of strings from ''L'')
*the
quotient
In arithmetic, a quotient (from 'how many times', pronounced ) is a quantity produced by the division of two numbers. The quotient has widespread use throughout mathematics. It has two definitions: either the integer part of a division (in th ...
''L''/''R'' of ''L'' by a regular language ''R''
Nonclosure under intersection, complement, and difference
The context-free languages are not closed under intersection. This can be seen by taking the languages
and
, which are both context-free.
[A context-free grammar for the language ''A'' is given by the following production rules, taking ''S'' as the start symbol: ''S'' → ''Sc'' , ''aTb'' , ''ε''; ''T'' → ''aTb'' , ''ε''. The grammar for ''B'' is analogous.] Their intersection is
, which can be shown to be non-context-free by the
pumping lemma for context-free languages. As a consequence, context-free languages cannot be closed under complementation, as for any languages ''A'' and ''B'', their intersection can be expressed by union and complement:
. In particular, context-free language cannot be closed under difference, since complement can be expressed by difference:
.
However, if ''L'' is a context-free language and ''D'' is a regular language then both their intersection
and their difference
are context-free languages.
Decidability
In formal language theory, questions about regular languages are usually decidable, but ones about context-free languages are often not. It is decidable whether such a language is finite, but not whether it contains every possible string, is regular, is unambiguous, or is equivalent to a language with a different grammar.
The following problems are
undecidable for arbitrarily given
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules
can be applied to a nonterminal symbol regardless of its context.
In particular, in a context-free grammar, each production rule is of the fo ...
s A and B:
*Equivalence: is
?
*Disjointness: is
? However, the intersection of a context-free language and a ''regular'' language is context-free, hence the variant of the problem where ''B'' is a regular grammar is decidable (see "Emptiness" below).
*Containment: is
? Again, the variant of the problem where ''B'' is a regular grammar is decidable, while that where ''A'' is regular is generally not.
*Universality: is
?
*Regularity: is
a regular language?
*Ambiguity: is every grammar for
ambiguous?
The following problems are ''decidable'' for arbitrary context-free languages:
*Emptiness: Given a context-free grammar ''A'', is
?
*Finiteness: Given a context-free grammar ''A'', is
finite?
*Membership: Given a context-free grammar ''G'', and a word
, does
? Efficient polynomial-time algorithms for the membership problem are the
CYK algorithm and
Earley's Algorithm.
According to Hopcroft, Motwani, Ullman (2003),
many of the fundamental closure and (un)decidability properties of context-free languages were shown in the 1961 paper of
Bar-Hillel, Perles, and Shamir
Languages that are not context-free
The set
is a
context-sensitive language
In formal language theory, a context-sensitive language is a language that can be defined by a context-sensitive grammar (and equivalently by a noncontracting grammar). Context-sensitive is known as type-1 in the Chomsky hierarchy of formal langu ...
, but there does not exist a context-free grammar generating this language. So there exist context-sensitive languages which are not context-free. To prove that a given language is not context-free, one may employ the
pumping lemma for context-free languages or a number of other methods, such as
Ogden's lemma or
Parikh's theorem.
Notes
References
Works cited
*
*
Further reading
*
*
*
{{Authority control
Formal languages
Syntax