Predictive Parser
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, a recursive descent parser is a kind of
top-down parser Top-down parsing in computer science is a parsing strategy where one first looks at the highest level of the parse tree and works down the parse tree by using the rewriting rules of a formal grammar. LL parsers are a type of parser that uses a top- ...
built from a set of
mutually recursive In mathematics and computer science, mutual recursion is a form of recursion where two mathematical or computational objects, such as functions or datatypes, are defined in terms of each other. Mutual recursion is very common in functional program ...
procedures (or a non-recursive equivalent) where each such procedure implements one of the nonterminals of the
grammar In linguistics, grammar is the set of rules for how a natural language is structured, as demonstrated by its speakers or writers. Grammar rules may concern the use of clauses, phrases, and words. The term may also refer to the study of such rul ...
. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes. A ''predictive parser'' is a recursive descent parser that does not require
backtracking Backtracking is a class of algorithms for finding solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it de ...
. Predictive parsing is possible only for the class of LL(''k'') grammars, which are the
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the fo ...
s for which there exists some positive integer ''k'' that allows a recursive descent parser to decide which production to use by examining only the next ''k'' tokens of input. The LL(''k'') grammars therefore exclude all
ambiguous grammar In computer science, an ambiguous grammar is a context-free grammar for which there exists a string (computer science), string that can have more than one leftmost derivation or parse tree. Every non-empty context-free language admits an ambiguous ...
s, as well as all grammars that contain
left recursion In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language (on the left) and a suffix (on ...
. Any context-free grammar can be transformed into an equivalent grammar that has no left recursion, but removal of left recursion does not always yield an LL(''k'') grammar. A predictive parser runs in
linear time In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations ...
. Recursive descent with backtracking is a technique that determines which
production Production may refer to: Economics and business * Production (economics) * Production, the act of manufacturing goods * Production, in the outline of industrial organization, the act of making products (goods and services) * Production as a stat ...
to use by trying each production in turn. Recursive descent with backtracking is not limited to LL(''k'') grammars, but is not guaranteed to terminate unless the grammar is LL(''k''). Even when they terminate, parsers that use recursive descent with backtracking may require
exponential time In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations p ...
. Although predictive parsers are widely used, and are frequently chosen if writing a parser by hand, programmers often prefer to use a table-based parser produced by a
parser generator In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler- ...
, either for an LL(''k'') language or using an alternative parser, such as
LALR In computer science, an LALR parser (look-ahead, left-to-right, rightmost derivation parser) is part of the compiling process where human readable text is converted into a structured representation to be read by computers. An LALR parser is a soft ...
or LR. This is particularly the case if a grammar is not in LL(''k'') form, as transforming the grammar to LL to make it suitable for predictive parsing is involved. Predictive parsers can also be automatically generated, using tools like
ANTLR In computer-based language recognition, ANTLR (pronounced ''antler''), or ANother Tool for Language Recognition, is a parser generator that uses a LL(*) algorithm for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set ...
. Predictive parsers can be depicted using transition diagrams for each non-terminal symbol where the edges between the initial and the final states are labelled by the symbols (terminals and non-terminals) of the right side of the production rule.


Example parser

The following EBNF-like
grammar In linguistics, grammar is the set of rules for how a natural language is structured, as demonstrated by its speakers or writers. Grammar rules may concern the use of clauses, phrases, and words. The term may also refer to the study of such rul ...
(for
Niklaus Wirth Niklaus Emil Wirth ( IPA: ) (15 February 1934 – 1 January 2024) was a Swiss computer scientist. He designed several programming languages, including Pascal, and pioneered several classic topics in software engineering. In 1984, he won the Tu ...
's
PL/0 PL/0 is a programming language, intended as an educational programming language, that is similar to but much simpler than Pascal, a general-purpose programming language. It serves as an example of how to construct a compiler. It was originally in ...
programming language, from ''
Algorithms + Data Structures = Programs In mathematics and computer science, an algorithm () is a finite sequence of mathematically rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for perf ...
'') is in LL(1) form: program = block "." . block = const" ident "=" number ";" var" ident ";" statement . statement = ident ":=" expression , "call" ident , "begin" statement "end" , "if" condition "then" statement , "while" condition "do" statement . condition = "odd" expression , expression ("=", "#", "<", "<=", ">", ">=") expression . expression = "-"term . term = factor . factor = ident , number , "(" expression ")" . Terminals are expressed in quotes. Each
nonterminal In formal languages, terminal and nonterminal symbols are parts of the ''vocabulary'' under a formal grammar. ''Vocabulary'' is a finite, nonempty set of symbols. ''Terminal symbols'' are symbols that cannot be replaced by other symbols of the v ...
is defined by a rule in the grammar, except for ''ident'' and ''number'', which are assumed to be implicitly defined.


C implementation

What follows is an implementation of a recursive descent parser for the above language in C. The parser reads in source code, and exits with an error message if the code fails to parse, exiting silently if the code parses correctly. Notice how closely the predictive parser below mirrors the grammar above. There is a procedure for each nonterminal in the grammar. Parsing descends in a top-down manner until the final nonterminal has been processed. The program fragment depends on a global variable, ''sym'', which contains the current symbol from the input, and the function ''nextsym'', which updates ''sym'' when called. The implementations of the functions ''nextsym'' and ''error'' are omitted for simplicity. typedef enum Symbol; Symbol sym; void nextsym(void); void error(const char msg[]); int accept(Symbol s) int expect(Symbol s) void factor(void) void term(void) void expression(void) void condition(void) void statement(void) void block(void) void program(void)


Examples

Some recursive descent parser generators: * TMG – an early compiler-compiler used in the 1960s and early 1970s * JavaCC *
Coco/R Coco/R is a compiler generator that takes wirth syntax notationIn the manual, however, it is referred as L-attributed Extended Backus–Naur Form syntax (EBNF). grammars of a source language and generates a scanner and a parser for that lan ...
*
ANTLR In computer-based language recognition, ANTLR (pronounced ''antler''), or ANother Tool for Language Recognition, is a parser generator that uses a LL(*) algorithm for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set ...
*
Spirit Parser Framework The Spirit Parser Framework is an object oriented recursive descent parser generator framework implemented using template metaprogramming techniques. Expression templates allow users to approximate the syntax of extended Backus–Naur form (EBNF) c ...
– a C++ recursive descent parser generator framework requiring no pre-compile step *
parboiled (Java) parboiled is an open-source Java (programming language), Java library released under an Apache License. It provides support for defining Parsing expression grammar, PEG parsers directly in Java source code. parboiled is commonly used as an altern ...
– a recursive descent PEG parsing library for
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...


See also

*
Parser combinator In computer programming, a parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output. In this context, a parser is a function accepting strings as input and returning some structure as ...
– a higher-order function used in combinatory parsing, a method of factoring recursive descent parser designs *
Parsing expression grammar In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 20 ...
– another form representing recursive descent grammar *
Recursive ascent parser In computer science, recursive ascent parsing is a technique for implementing an LR parser which uses mutually-recursive functions rather than tables. Thus, the parser is ''directly encoded'' in the host language similar to recursive descent. Dire ...
* Tail recursive parser – a variant of the recursive descent parser


References


General references

* '' Compilers: Principles, Techniques, and Tools'', first edition, Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman, in particular Section 4.4. * ''Modern Compiler Implementation in Java, Second Edition'', Andrew Appel, 2002, . * ''Recursive Programming Techniques'', W.H. Burge, 1975, * ''Crafting a Compiler with C'', Charles N Fischer and Richard J LeBlanc, Jr, 1991, . * ''Compiling with C# and Java'', Pat Terry, 2005, , 624 * ''Algorithms + Data Structures = Programs'', Niklaus Wirth, 1975, * ''Compiler Construction'', Niklaus Wirth, 1996,


External links


Jack W. Crenshaw: ''Let's Build A Compiler'' (1988-1995)
in Pascal, with
assembly language In computing, assembly language (alternatively assembler language or symbolic machine code), often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence bet ...
output, using a "keep it simple" approach {{Parsers Parsing algorithms Articles with example C code