In
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, a production or production rule is a
rewrite rule
In mathematics, computer science, and logic, rewriting covers a wide range of methods of replacing subterms of a well-formed formula, formula with other terms. Such methods may be achieved by rewriting systems (also known as rewrite systems, rewr ...
that replaces some symbols with other symbols. A finite set of productions
is the main component in the specification of a
formal grammar
A formal grammar is a set of Terminal and nonterminal symbols, symbols and the Production (computer science), production rules for rewriting some of them into every possible string of a formal language over an Alphabet (formal languages), alphabe ...
(specifically a
generative grammar
Generative grammar is a research tradition in linguistics that aims to explain the cognitive basis of language by formulating and testing explicit models of humans' subconscious grammatical knowledge. Generative linguists, or generativists (), ...
). The other components are a finite set
of
nonterminal symbol
In formal languages, terminal and nonterminal symbols are parts of the ''vocabulary'' under a formal grammar. ''Vocabulary'' is a finite, nonempty set of symbols. ''Terminal symbols'' are symbols that cannot be replaced by other symbols of the v ...
s, a finite set (known as an alphabet)
of
terminal symbol
In formal languages, terminal and nonterminal symbols are parts of the ''vocabulary'' under a formal grammar. ''Vocabulary'' is a finite, nonempty set of symbols. ''Terminal symbols'' are symbols that cannot be replaced by other symbols of the v ...
s that is
disjoint from
and a distinguished symbol
that is the ''start symbol''.
In an
unrestricted grammar
In automata theory, the class of unrestricted grammars (also called semi-Thue, type-0 or phrase structure grammars) is the most general class of grammars in the Chomsky hierarchy. No restrictions are made on the productions of an unrestricted gramm ...
, a production is of the form
, where
and
are arbitrary strings of terminals and nonterminals, and
may not be the
empty string
In formal language theory, the empty string, or empty word, is the unique String (computer science), string of length zero.
Formal theory
Formally, a string is a finite, ordered sequence of character (symbol), characters such as letters, digits ...
. If
is the empty string, this is denoted by the symbol
, or
(rather than leaving the right-hand side blank). So productions are members of the
cartesian product
In mathematics, specifically set theory, the Cartesian product of two sets and , denoted , is the set of all ordered pairs where is an element of and is an element of . In terms of set-builder notation, that is
A\times B = \.
A table c ...
:
,
where
is the ''vocabulary'',
is the
Kleene star
In mathematical logic and theoretical computer science, the Kleene star (or Kleene operator or Kleene closure) is a unary operation on a Set (mathematics), set to generate a set of all finite-length strings that are composed of zero or more repe ...
operator,
indicates
concatenation
In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalizations of concatenati ...
,
denotes
set union
In set theory, the union (denoted by ∪) of a collection of sets is the set of all elements in the collection. It is one of the fundamental operations through which sets can be combined and related to each other.
A refers to a union of ze ...
, and
denotes
set minus or set difference. If we do not allow the start symbol to occur in
(the word on the right side), we have to replace
by
on the right side of the cartesian product symbol.
[See Klaus Reinhardt]
Prioritatszahlerautomaten und die Synchronisation von Halbspursprachen
; Fakultät Informatik der Universität Stuttgart; 1994 (German)
The other types of formal grammar in the
Chomsky hierarchy
The Chomsky hierarchy in the fields of formal language theory, computer science, and linguistics, is a containment hierarchy of classes of formal grammars. A formal grammar describes how to form strings from a formal language's alphabet that are v ...
impose additional restrictions on what constitutes a production. Notably in a
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules
can be applied to a nonterminal symbol regardless of its context.
In particular, in a context-free grammar, each production rule is of the fo ...
, the left-hand side of a production must be a single nonterminal symbol. So productions are of the form:
:
Grammar generation
To generate a string in the language, one begins with a string consisting of only a single ''start symbol'', and then successively applies the rules (any number of times, in any order) to rewrite this string. This stops when a string containing only terminals is obtained. The language consists of all the strings that can be generated in this manner. Any particular sequence of legal choices taken during this rewriting process yields one particular string in the language. If there are multiple different ways of generating this single string, then the grammar is said to be
ambiguous
Ambiguity is the type of meaning in which a phrase, statement, or resolution is not explicitly defined, making for several interpretations; others describe it as a concept or statement that has no real reference. A common aspect of ambiguit ...
.
For example, assume the alphabet consists of
and
, with the start symbol
, and we have the following rules:
: 1.
: 2.
then we start with
, and can choose a rule to apply to it. If we choose rule 1, we replace
with
and obtain the string
. If we choose rule 1 again, we replace
with
and obtain the string
. This process is repeated until we only have symbols from the alphabet (i.e.,
and
). If we now choose rule 2, we replace
with
and obtain the string
, and are done. We can write this series of choices more briefly, using symbols:
. The language of the grammar is the set of all the strings that can be generated using this process:
.
See also
*
Formal grammar
A formal grammar is a set of Terminal and nonterminal symbols, symbols and the Production (computer science), production rules for rewriting some of them into every possible string of a formal language over an Alphabet (formal languages), alphabe ...
*
Finite automata
A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number ...
*
Generative grammar
Generative grammar is a research tradition in linguistics that aims to explain the cognitive basis of language by formulating and testing explicit models of humans' subconscious grammatical knowledge. Generative linguists, or generativists (), ...
*
L-system
An L-system or Lindenmayer system is a parallel rewriting system and a type of formal grammar. An L-system consists of an alphabet of symbols that can be used to make strings, a collection of production rules that expand each symbol into some ...
*
Rewrite rule
In mathematics, computer science, and logic, rewriting covers a wide range of methods of replacing subterms of a well-formed formula, formula with other terms. Such methods may be achieved by rewriting systems (also known as rewrite systems, rewr ...
*
Backus–Naur form
In computer science, Backus–Naur form (BNF, pronounced ), also known as Backus normal form, is a notation system for defining the Syntax (programming languages), syntax of Programming language, programming languages and other Formal language, for ...
(A compact form for writing the productions of a context-free grammar.)
*
Phrase structure rule
Phrase structure rules are a type of rewrite rule used to describe a given language's syntax and are closely associated with the early stages of transformational grammar, proposed by Noam Chomsky in 1957. They are used to break down a natural langu ...
*
Post canonical system
A Post canonical system, also known as a Post production system, as created by Emil Post, is a string-manipulation system that starts with finitely many strings and repeatedly transforms them by applying a finite set j of specified rules of a cert ...
(Emil Post's production systems- a model of computation.)
References
Grammar
Natural language processing
Formal languages