In the

formal language theory In logic, mathematics, computer science, and linguistics, a formal language is a set of string (computer science), strings whose symbols are taken from a set called "#Definition, alphabet". The alphabet of a formal language consists of symbol ...

computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

, left recursion is a special case of

recursion Recursion occurs when the definition of a concept or process depends on a simpler or previous version of itself. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in m ...

where a string is recognized as part of a language by the fact that it decomposes into a string from that same language (on the left) and a suffix (on the right). For instance,

1+2+3

can be recognized as a sum because it can be broken into

1+2

, also a sum, and

+3

, a suitable suffix. In terms of

context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the fo ...

, a

nonterminal In formal languages, terminal and nonterminal symbols are parts of the ''vocabulary'' under a formal grammar. ''Vocabulary'' is a finite, nonempty set of symbols. ''Terminal symbols'' are symbols that cannot be replaced by other symbols of the v ...

is left-recursive if the leftmost symbol in one of its productions is itself (in the case of direct left recursion) or can be made itself by some sequence of substitutions (in the case of indirect left recursion).

Definition

A grammar is left-recursive if and only if there exists a nonterminal symbol

A

that can derive to a sentential form with itself as the leftmost symbol.. James Power, Department of Computer Science National University of Ireland, Maynooth Maynooth, Co. Kildare, Ireland. JPR02 Symbolically, :

A \Rightarrow^+ A\alpha

, where

\Rightarrow^+

indicates the operation of making one or more substitutions, and

\alpha

is any sequence of terminal and nonterminal symbols.

Direct left recursion

Direct left recursion occurs when the definition can be satisfied with only one substitution. It requires a rule of the form :

A \to A\alpha

where

\alpha

is a sequence of nonterminals and terminals . For example, the rule :

\mathit \to \mathit + \mathit

is directly left-recursive. A left-to-right

recursive descent parser In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures (or a non-recursive equivalent) where each such procedure implements one of the nonterminals of the grammar. Thus t ...

for this rule might look like void Expression() and such code would fall into infinite recursion when executed.

Indirect left recursion

Indirect left recursion occurs when the definition of left recursion is satisfied via several substitutions. It entails a set of rules following the pattern :

A_0 \to \beta_0A_1\alpha_0

A_1 \to \beta_1A_2\alpha_1

\cdots

A_n \to \beta_nA_0\alpha_n

where

\beta_0, \beta_1, \ldots, \beta_n

are sequences that can each yield the

empty string In formal language theory, the empty string, or empty word, is the unique String (computer science), string of length zero. Formal theory Formally, a string is a finite, ordered sequence of character (symbol), characters such as letters, digits ...

, while

\alpha_0, \alpha_1, \ldots, \alpha_n

may be any sequences of terminal and nonterminal symbols at all. Note that these sequences may be empty. The derivation :

A_0\Rightarrow\beta_0A_1\alpha_0\Rightarrow^+ A_1\alpha_0\Rightarrow\beta_1A_2\alpha_1\alpha_0\Rightarrow^+\cdots\Rightarrow^+ A_0\alpha_n\dots\alpha_1\alpha_0

then gives

A_0

as leftmost in its final sentential form.

Uses

Left recursion is commonly used as an idiom for making operations

left-associative In programming language theory, the associativity of an operator is a property that determines how operators of the same precedence are grouped in the absence of parentheses. If an operand is both preceded and followed by operators (for exampl ...

: that an expression a+b-c-d+e is evaluated as (((a+b)-c)-d)+e. In this case, that evaluation order could be achieved as a matter of syntax via the three grammatical rules :

\mathit \to \mathit

\mathit \to \mathit + \mathit

\mathit \to \mathit - \mathit

These only allow parsing the

\mathit

a+b-c-d+e as consisting of the

\mathit

a+b-c-d and

\mathit

e, where a+b-c-d in turn consists of the

\mathit

a+b-c and

\mathit

d, while a+b-c consists of the

\mathit

a+b and

\mathit

c, etc.

Removing left recursion

Left recursion often poses problems for parsers, either because it leads them into infinite recursion (as in the case of most top-down parsers) or because they expect rules in a normal form that forbids it (as in the case of many bottom-up parsers). Therefore, a grammar is often preprocessed to eliminate the left recursion.

Removing direct left recursion

The general algorithm to remove direct left recursion follows. Several improvements to this method have been made. For a left-recursive nonterminal

A

, discard any rules of the form

A\rightarrow A

and consider those that remain: :

A \rightarrow A\alpha_1 \mid \ldots \mid A\alpha_n \mid \beta_1 \mid \ldots \mid \beta_m

where: * each

\alpha

is a nonempty sequence of nonterminals and terminals, and * each

\beta

is a sequence of nonterminals and terminals that does not start with

A

. Replace these with two sets of productions, one set for

A

: :

A \rightarrow \beta_1A^\prime \mid \ldots \mid \beta_mA^\prime

and another set for the fresh nonterminal

A'

(often called the "tail" or the "rest"): :

A^\prime \rightarrow \alpha_1A^\prime \mid \ldots \mid \alpha_nA^\prime \mid \epsilon

Repeat this process until no direct left recursion remains. As an example, consider the rule set :

\mathit \rightarrow \mathit+\mathit \mid \mathit \mid \mathit

This could be rewritten to avoid left recursion as :

\mathit \rightarrow \mathit\,\mathit' \mid \mathit\,\mathit'

\mathit' \rightarrow +\mathit \,\mathit'\mid \epsilon

Removing all left recursion

The above process can be extended to eliminate all left recursion, by first converting indirect left recursion to direct left recursion on the highest numbered nonterminal in a cycle. :Inputs ''A grammar: a set of nonterminals

A_1,\ldots,A_n

and their productions'' :Output ''A modified grammar generating the same language but without left recursion'' :# ''For each nonterminal

A_i

:'' :## ''Repeat until an iteration leaves the grammar unchanged:'' :### ''For each rule

A_i\rightarrow\alpha_i

\alpha_i

being a sequence of terminals and nonterminals:'' :#### ''If

\alpha_i

begins with a nonterminal

A_j

and

j :''
:##### ''Let \beta_i be \alpha_i without its leading A_j .''
:##### ''Remove the rule A_i\rightarrow\alpha_i .''
:##### ''For each rule A_j\rightarrow\alpha_j :''
:###### ''Add the rule A_i\rightarrow\alpha_j\beta_i .''
:## ''Remove direct left recursion for A_i as described above.''
Step 1.1.1 amounts to expanding the initial nonterminal A_j in the right hand side of some rule A_i \to A_j \beta, but only if j . If A_i \to A_j \beta was one step in a cycle of productions giving rise to a left recursion, then this has shortened that cycle by one step, but often at the price of increasing the number of rules.

The algorithm may be viewed as establishing a topological ordering on nonterminals: afterwards there can only be a rule A_i \to A_j \beta if j>i .
Note that this algorithm is highly sensitive to the nonterminal ordering; optimizations often focus on choosing this ordering well.

Pitfalls

Although the above transformations preserve the language generated by a grammar, they may change the

parse tree A parse tree or parsing tree (also known as a derivation tree or concrete syntax tree) is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is use ...

s that

witness In law, a witness is someone who, either voluntarily or under compulsion, provides testimonial evidence, either oral or written, of what they know or claim to know. A witness might be compelled to provide testimony in court, before a grand jur ...

strings' recognition. With suitable bookkeeping, tree rewriting can recover the originals, but if this step is omitted, the differences may change the semantics of a parse. Associativity is particularly vulnerable; left-associative operators typically appear in right-associative-like arrangements under the new grammar. For example, starting with this grammar: :

\mathit \rightarrow \mathit\,-\,\mathit \mid \mathit

\mathit \rightarrow \mathit\,*\,\mathit \mid \mathit

\mathit \rightarrow (\mathit) \mid \mathit

the standard transformations to remove left recursion yield the following: :

\mathit \rightarrow \mathit\ \mathit'

\mathit' \rightarrow  - \mathit\ \mathit' \mid \epsilon

\mathit \rightarrow \mathit\ \mathit'

\mathit' \rightarrow  * \mathit\ \mathit' \mid \epsilon

\mathit \rightarrow (\mathit) \mid \mathit

Parsing the string "1 - 2 - 3" with the first grammar in an LALR parser (which can handle left-recursive grammars) would have resulted in the parse tree: Left-recursive-parse-of-a-double-subtraction

Left-recursive-parse-of-a-double-subtraction

This parse tree groups the terms on the left, giving the correct semantics ''(1 - 2) - 3''. Parsing with the second grammar gives Right-recursive-parse-of-a-double-subtraction

Right-recursive-parse-of-a-double-subtraction

which, properly interpreted, signifies ''1 + (-2 + (-3))'', also correct, but less faithful to the input and much harder to implement for some operators. Notice how terms to the right appear deeper in the tree, much as a right-recursive grammar would arrange them for ''1 - (2 - 3)''.

Accommodating left recursion in top-down parsing

formal grammar A formal grammar is a set of Terminal and nonterminal symbols, symbols and the Production (computer science), production rules for rewriting some of them into every possible string of a formal language over an Alphabet (formal languages), alphabe ...

that contains left recursion cannot be

parse Parsing, syntax analysis, or syntactic analysis is a process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar by breaking it into parts. The term ''pa ...

d by a LL(k)-parser or other naive

unless it is converted to a weakly equivalent right-recursive form. In contrast, left recursion is preferred for

LALR parser In computer science, an LALR parser (look-ahead, left-to-right, rightmost derivation parser) is part of the compiling process where human readable text is converted into a structured representation to be read by computers. An LALR parser is a soft ...

s because it results in lower stack usage than right recursion. However, more sophisticated top-down parsers can implement general

s by use of curtailment. In 2006, Frost and Hafiz described an algorithm which accommodates

ambiguous grammar In computer science, an ambiguous grammar is a context-free grammar for which there exists a string (computer science), string that can have more than one leftmost derivation or parse tree. Every non-empty context-free language admits an ambiguous ...

s with direct left-recursive production rules., available from the author at http://hafiz.myweb.cs.uwindsor.ca/pub/p46-frost.pdf That algorithm was extended to a complete

parsing Parsing, syntax analysis, or syntactic analysis is a process of analyzing a String (computer science), string of Symbol (formal), symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal gramm ...

algorithm to accommodate indirect as well as direct left recursion in

polynomial In mathematics, a polynomial is a Expression (mathematics), mathematical expression consisting of indeterminate (variable), indeterminates (also called variable (mathematics), variables) and coefficients, that involves only the operations of addit ...

time, and to generate compact polynomial-size representations of the potentially exponential number of parse trees for highly ambiguous grammars by Frost, Hafiz and Callaghan in 2007. The authors then implemented the algorithm as a set of parser combinators written in the

Haskell Haskell () is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell pioneered several programming language ...

programming language.{{cite book, last=Frost, first=R. , author2=R. Hafiz , author3=P. Callaghan, title=Practical Aspects of Declarative Languages , chapter=Parser Combinators for Ambiguous Left-Recursive Grammars , date=January 2008, volume=4902, issue=2008, pages=167–181, url=https://cs.uwindsor.ca/~richard/PUBLICATIONS/PADL_08.pdf, doi=10.1007/978-3-540-77442-6_12, series=Lecture Notes in Computer Science, isbn=978-3-540-77441-9

References

External links

Practical Considerations for LALR(1) Grammars
Control flow Formal languages Parsing Recursion