In the
formal language theory
In logic, mathematics, computer science, and linguistics, a formal language is a set of string (computer science), strings whose symbols are taken from a set called "#Definition, alphabet".
The alphabet of a formal language consists of symbol ...
of
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, left recursion is a special case of
recursion
Recursion occurs when the definition of a concept or process depends on a simpler or previous version of itself. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in m ...
where a string is recognized as part of a language by the fact that it decomposes into a string from that same language (on the left) and a suffix (on the right). For instance,
can be recognized as a sum because it can be broken into
, also a sum, and
, a suitable suffix.
In terms of
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules
can be applied to a nonterminal symbol regardless of its context.
In particular, in a context-free grammar, each production rule is of the fo ...
, a
nonterminal
In formal languages, terminal and nonterminal symbols are parts of the ''vocabulary'' under a formal grammar. ''Vocabulary'' is a finite, nonempty set of symbols. ''Terminal symbols'' are symbols that cannot be replaced by other symbols of the v ...
is left-recursive if the leftmost symbol in one of its productions is itself (in the case of direct left recursion) or can be made itself by some sequence of substitutions (in the case of indirect left recursion).
Definition
A grammar is left-recursive if and only if there exists a nonterminal symbol
that can derive to a
sentential form with itself as the leftmost symbol.
[. James Power, Department of Computer Science National University of Ireland, Maynooth Maynooth, Co. Kildare, Ireland. JPR02] Symbolically,
:
,
where
indicates the operation of making one or more substitutions, and
is any sequence of terminal and nonterminal symbols.
Direct left recursion
Direct left recursion occurs when the definition can be satisfied with only one substitution. It requires a rule of the form
:
where
is a sequence of nonterminals and terminals . For example, the rule
:
is directly left-recursive. A left-to-right
recursive descent parser
In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures (or a non-recursive equivalent) where each such procedure implements one of the nonterminals of the grammar. Thus t ...
for this rule might look like
void Expression()
and such code would fall into infinite recursion when executed.
Indirect left recursion
Indirect left recursion occurs when the definition of left recursion is satisfied via several substitutions. It entails a set of rules following the pattern
:
:
:
:
where
are sequences that can each yield the
empty string
In formal language theory, the empty string, or empty word, is the unique String (computer science), string of length zero.
Formal theory
Formally, a string is a finite, ordered sequence of character (symbol), characters such as letters, digits ...
, while
may be any sequences of terminal and nonterminal symbols at all. Note that these sequences may be empty. The derivation
:
then gives
as leftmost in its final sentential form.
Uses
Left recursion is commonly used as an idiom for making operations
left-associative
In programming language theory, the associativity of an operator is a property that determines how operators of the same precedence are grouped in the absence of parentheses. If an operand is both preceded and followed by operators (for exampl ...
: that an expression
a+b-c-d+e
is evaluated as
(((a+b)-c)-d)+e
. In this case, that evaluation order could be achieved as a matter of syntax via the three grammatical rules
:
:
:
These only allow parsing the
a+b-c-d+e
as consisting of the
a+b-c-d
and
e
, where
a+b-c-d
in turn consists of the
a+b-c
and
d
, while
a+b-c
consists of the
a+b
and
c
, etc.
Removing left recursion
Left recursion often poses problems for parsers, either because it leads them into infinite recursion (as in the case of most
top-down parsers) or because they expect rules in a normal form that forbids it (as in the case of many
bottom-up parsers). Therefore, a grammar is often preprocessed to eliminate the left recursion.
Removing direct left recursion
The general algorithm to remove direct left recursion follows. Several improvements to this method have been made.
For a left-recursive nonterminal
, discard any rules of the form
and consider those that remain:
:
where:
* each
is a nonempty sequence of nonterminals and terminals, and
* each
is a sequence of nonterminals and terminals that does not start with
.
Replace these with two sets of productions, one set for
:
:
and another set for the fresh nonterminal
(often called the "tail" or the "rest"):
:
Repeat this process until no direct left recursion remains.
As an example, consider the rule set
:
This could be rewritten to avoid left recursion as
:
:
Removing all left recursion
The above process can be extended to eliminate all left recursion, by first converting indirect left recursion to direct left recursion on the highest numbered nonterminal in a cycle.
:Inputs ''A grammar: a set of nonterminals
and their productions''
:Output ''A modified grammar generating the same language but without left recursion''
:# ''For each nonterminal
:''
:## ''Repeat until an iteration leaves the grammar unchanged:''
:### ''For each rule
,
being a sequence of terminals and nonterminals:''
:#### ''If
begins with a nonterminal
and