
In
formal languages
In logic, mathematics, computer science, and linguistics, a formal language is a set of string (computer science), strings whose symbols are taken from a set called "#Definition, alphabet".
The alphabet of a formal language consists of symbol ...
, terminal and nonterminal symbols are
parts of the ''
vocabulary
A vocabulary (also known as a lexicon) is a set of words, typically the set in a language or the set known to an individual. The word ''vocabulary'' originated from the Latin , meaning "a word, name". It forms an essential component of languag ...
'' under a
formal grammar
A formal grammar is a set of Terminal and nonterminal symbols, symbols and the Production (computer science), production rules for rewriting some of them into every possible string of a formal language over an Alphabet (formal languages), alphabe ...
. ''Vocabulary'' is a finite, nonempty set of symbols. ''Terminal symbols'' are symbols that cannot be replaced by other symbols of the vocabulary. ''Nonterminal symbols'' are symbols that can be replaced by other symbols of the vocabulary by the
production rules under the same formal grammar.
A formal grammar defines a formal language over the vocabulary of the grammar.
In the context of
formal language
In logic, mathematics, computer science, and linguistics, a formal language is a set of strings whose symbols are taken from a set called "alphabet".
The alphabet of a formal language consists of symbols that concatenate into strings (also c ...
, the term ''vocabulary'' is more commonly known as
''alphabet''. Nonterminal symbols are also called ''syntactic variables''.
Terminal symbols
Terminal symbols are those symbols that can appear in the formal language defined by a formal grammar. The process of applying the production rules successively to a
start symbol might not terminate, but if it terminates when there is no more production rule can be applied, the output string will consist only of terminal symbols.
For example, consider a grammar defined by two rules. In this grammar, the symbol
Б
is a terminal symbol and
Ψ
is both a nonterminal symbol and the start symbol. The production rules for creating strings are as follows:
# The symbol
Ψ
can become
Б
Ψ
# The symbol
Ψ
can become
Б
Here
Б
is a terminal symbol because no rule exists to replace it with other symbols. On the other hand,
Ψ
has two rules that can change it, thus it is nonterminal. The rules define a formal language that contains
countably infinite
In mathematics, a set is countable if either it is finite or it can be made in one to one correspondence with the set of natural numbers. Equivalently, a set is ''countable'' if there exists an injective function from it into the natural numbe ...
many finite-length
words by the fact that we can apply the first rule any countable times as we wish. Diagram 1 illustrates a string that can be produced with this grammar.
Nonterminal symbols
Nonterminal symbols are those symbols that cannot appear in the formal language defined by a formal grammar. A formal grammar includes a ''start symbol'', which is a designated member of the set of nonterminal symbols. We can derive a set of strings of only terminal symbols by successively applying the production rules. The generated set is a formal language over the set of terminal symbols.
Context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules
can be applied to a nonterminal symbol regardless of its context.
In particular, in a context-free grammar, each production rule is of the fo ...
s are those grammars in which the left-hand side of each production rule consists of only a single nonterminal symbol. This restriction is non-trivial; not all languages can be generated by context-free grammars. Those that can are called
context-free languages. These are exactly the languages that can be recognized by a non-deterministic
push down automaton. Context-free languages are the theoretical basis for the syntax of most
programming languages.
Production rules
A grammar is defined by
production rules (or just 'productions') that specify which symbols can replace which other symbols; these rules can be used to generate strings, or to parse them. Each such rule has a ''head'', or left-hand side, which consists of the string that can be replaced, and a ''body'', or right-hand side, which consists of a string that can replace it. Rules are often written in the form ''head'' → ''body''; e.g., the rule ''a'' → ''b'' specifies that ''a'' can be replaced by ''b''.
In the classic formalization of generative grammars first proposed by
Noam Chomsky
Avram Noam Chomsky (born December 7, 1928) is an American professor and public intellectual known for his work in linguistics, political activism, and social criticism. Sometimes called "the father of modern linguistics", Chomsky is also a ...
in the 1950s,
a grammar ''G'' consists of the following components:
* A finite set of ''nonterminal symbols''.
* A finite set of ''terminal symbols'' that is
disjoint from .
* A finite set of ''production rules'', each rule of the form
::
:where
denotes the set of all possible finite-length strings over the vocabulary
using
Kleene star
In mathematical logic and theoretical computer science, the Kleene star (or Kleene operator or Kleene closure) is a unary operation on a Set (mathematics), set to generate a set of all finite-length strings that are composed of zero or more repe ...
. That is, each production rule replaces one string of symbols that contains at least one nonterminal symbol with another. In the case that the body consists solely of the
empty string
In formal language theory, the empty string, or empty word, is the unique String (computer science), string of length zero.
Formal theory
Formally, a string is a finite, ordered sequence of character (symbol), characters such as letters, digits ...
, it can be denoted with a special notation (often , or ) to avoid confusion.
* A distinguished symbol
that is the ''start symbol''.
A grammar is formally defined as the ordered quadruple
. Such a formal grammar is often called a
rewriting system or a
phrase structure grammar
The term phrase structure grammar was originally introduced by Noam Chomsky as the term for grammar studied previously by Emil Post and Axel Thue ( Post canonical systems). Some authors, however, reserve the term for more restricted grammars in t ...
in the literature.
Example
Backus–Naur form
In computer science, Backus–Naur form (BNF, pronounced ), also known as Backus normal form, is a notation system for defining the Syntax (programming languages), syntax of Programming language, programming languages and other Formal language, for ...
is a notation for expressing certain grammars. For instance, the following production rules in Backus-Naur form are used to represent an integer (which can be signed):
::= '0' , '1' , '2' , '3' , '4' , '5' , '6' , '7' , '8' , '9'
::= -'
In this example, terminal symbols are
, and nonterminal symbols are
.
Another example is:
:
S -> cAd
:
A -> a , ab
In this example, terminal symbols are
, and nonterminal symbols are
.
See also
*
Alphabet (formal languages)
*
Chomsky Hierarchy
The Chomsky hierarchy in the fields of formal language theory, computer science, and linguistics, is a containment hierarchy of classes of formal grammars. A formal grammar describes how to form strings from a formal language's alphabet that are v ...
*
Recursive grammar
Notes
References
{{reflist
Formal languages
Pattern matching