In
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, a recursive descent parser is a kind of
top-down parser built from a set of
mutually recursive procedures (or a non-recursive equivalent) where each such
procedure implements one of the
nonterminals of the
grammar
In linguistics, grammar is the set of rules for how a natural language is structured, as demonstrated by its speakers or writers. Grammar rules may concern the use of clauses, phrases, and words. The term may also refer to the study of such rul ...
. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.
A ''predictive parser'' is a recursive descent parser that does not require
backtracking
Backtracking is a class of algorithms for finding solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it de ...
.
Predictive parsing is possible only for the class of
LL(''k'') grammars, which are the
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules
can be applied to a nonterminal symbol regardless of its context.
In particular, in a context-free grammar, each production rule is of the fo ...
s for which there exists some positive integer ''k'' that allows a recursive descent parser to decide which production to use by examining only the next ''k'' tokens of input. The LL(''k'') grammars therefore exclude all
ambiguous grammars, as well as all grammars that contain
left recursion. Any context-free grammar can be transformed into an equivalent grammar that has no left recursion, but removal of left recursion does not always yield an LL(''k'') grammar. A predictive parser runs in
linear time
In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations ...
.
Recursive descent with backtracking is a technique that determines which
production to use by trying each production in turn. Recursive descent with backtracking is not limited to LL(''k'') grammars, but is not guaranteed to terminate unless the grammar is LL(''k''). Even when they terminate, parsers that use recursive descent with backtracking may require
exponential time.
Although predictive parsers are widely used, and are frequently chosen if writing a parser by hand, programmers often prefer to use a table-based parser produced by a
parser generator
In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.
The most common type of compiler- ...
, either for an LL(''k'') language or using an alternative parser, such as
LALR or
LR. This is particularly the case if a grammar is not in
LL(''k'') form, as transforming the grammar to LL to make it suitable for predictive parsing is involved. Predictive parsers can also be automatically generated, using tools like
ANTLR.
Predictive parsers can be depicted using transition diagrams for each non-terminal symbol where the edges between the initial and the final states are labelled by the symbols (terminals and non-terminals) of the right side of the production rule.
Example parser
The following
EBNF-like
grammar
In linguistics, grammar is the set of rules for how a natural language is structured, as demonstrated by its speakers or writers. Grammar rules may concern the use of clauses, phrases, and words. The term may also refer to the study of such rul ...
(for
Niklaus Wirth
Niklaus Emil Wirth ( IPA: ) (15 February 1934 – 1 January 2024) was a Swiss computer scientist. He designed several programming languages, including Pascal, and pioneered several classic topics in software engineering. In 1984, he won the Tu ...
's
PL/0 programming language, from ''
Algorithms + Data Structures = Programs'') is in
LL(1) form:
program = block "." .
block =
const" ident "=" number ";" var" ident ";" statement .
statement =
ident ":=" expression
, "call" ident
, "begin" statement "end"
, "if" condition "then" statement
, "while" condition "do" statement .
condition =
"odd" expression
, expression ("=", "#", "<", "<=", ">", ">=") expression .
expression = "-"term .
term = factor .
factor =
ident
, number
, "(" expression ")" .
Terminals are expressed in quotes. Each
nonterminal is defined by a rule in the grammar, except for ''ident'' and ''number'', which are assumed to be implicitly defined.
C implementation
What follows is an implementation of a recursive descent parser for the above language in
C. The parser reads in source code, and exits with an error message if the code fails to parse, exiting silently if the code parses correctly.
Notice how closely the predictive parser below mirrors the grammar above. There is a procedure for each nonterminal in the grammar. Parsing descends in a top-down manner until the final nonterminal has been processed. The program fragment depends on a global variable, ''sym'', which contains the current symbol from the input, and the function ''nextsym'', which updates ''sym'' when called.
The implementations of the functions ''nextsym'' and ''error'' are omitted for simplicity.
typedef enum Symbol;
Symbol sym;
void nextsym(void);
void error(const char msg[]);
int accept(Symbol s)
int expect(Symbol s)
void factor(void)
void term(void)
void expression(void)
void condition(void)
void statement(void)
void block(void)
void program(void)
Examples
Some recursive descent parser generators:
*
TMG – an early compiler-compiler used in the 1960s and early 1970s
*
JavaCC
*
Coco/R
*
ANTLR
*
Spirit Parser Framework – a C++ recursive descent parser generator framework requiring no pre-compile step
*
parboiled (Java)
parboiled is an open-source Java (programming language), Java library released under an Apache License. It provides support for defining Parsing expression grammar, PEG parsers directly in Java source code.
parboiled is commonly used as an altern ...
– a recursive descent
PEG parsing library for
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
See also
*
Parser combinator – a higher-order function used in combinatory parsing, a method of factoring recursive descent parser designs
*
Parsing expression grammar
In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 20 ...
– another form representing recursive descent grammar
*
Recursive ascent parser
*
Tail recursive parser – a variant of the recursive descent parser
References
General references
* ''
Compilers: Principles, Techniques, and Tools'', first edition, Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman, in particular Section 4.4.
* ''Modern Compiler Implementation in Java, Second Edition'', Andrew Appel, 2002, .
* ''Recursive Programming Techniques'', W.H. Burge, 1975,
* ''Crafting a Compiler with C'', Charles N Fischer and Richard J LeBlanc, Jr, 1991, .
* ''Compiling with C# and Java'', Pat Terry, 2005, , 624
* ''Algorithms + Data Structures = Programs'', Niklaus Wirth, 1975,
* ''Compiler Construction'', Niklaus Wirth, 1996,
External links
Jack W. Crenshaw: ''Let's Build A Compiler'' (1988-1995) in
Pascal, with
assembly language
In computing, assembly language (alternatively assembler language or symbolic machine code), often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence bet ...
output, using a "keep it simple" approach
{{Parsers
Parsing algorithms
Articles with example C code