HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
, recursive ascent parsing is a technique for implementing an
LALR parser In computer science, an LALR parser or Look-Ahead LR parser is a simplified version of a canonical LR parser, to parse a text according to a set of production rules specified by a formal grammar for a computer language. ("LR" means left-to-right, ...
which uses mutually-recursive functions rather than tables. Thus, the parser is ''directly encoded'' in the host language similar to recursive descent. Direct encoding usually yields a parser which is faster than its table-driven equivalent for the same reason that compilation is faster than interpretation. It is also (nominally) possible to hand edit a recursive ascent parser, whereas a tabular implementation is nigh unreadable to the average human. Recursive ascent was first described by Thomas Pennello in his article in 1986. He was not intending to create a hand-editable implementation of an LR parser, but rather a maintainable and efficient parser implemented in assembly language. The technique was later expounded upon by G.H. Roberts in 1988 as well as in an article by Leermakers, Augusteijn, Kruseman Aretz in 1992 in the journal ''Theoretical Computer Science''. An extremely readable description of the technique was written by Morell and Middleton in 2003. A good exposition can also be found in a TOPLAS article by Sperber and Thiemann. Recursive ascent has also been merged with recursive descent, yielding a technique known as
recursive ascent/descent Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics ...
. This implementation technique is arguably easier to hand-edit due to the reduction in states and fact that some of these states are more intuitively top-down rather than bottom up. It can also yield some minimal performance improvements over conventional recursive ascent.


Summary

Intuitively, recursive ascent is a literal implementation of the LR parsing concept. Each function in the parser represents a single LR
automaton An automaton (; plural: automata or automatons) is a relatively self-operating machine, or control mechanism designed to automatically follow a sequence of operations, or respond to predetermined instructions.Automaton – Definition and More ...
state. Within each function, a multi-branch statement is used to select the appropriate action based on the current token popped off the input stack. Once the token has been identified, action is taken based on the state being encoded. There are two different fundamental actions which may be taken based on the token in question: * Shift - Encoded as a function call, effectively jumping to a new automaton state. * Reduce - Encoded differently according to the
semantic action routine In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler- ...
for the relevant production. The result of this routine is wrapped in an ADT which is returned to the caller. The reduce action must also record the number of tokens which were shifted ''prior'' to the reduce, passing this value back to the caller along with the reduce value. This shift counter determines at which point up the call stack the reduce should be handled. There is also a third LR automaton action which may be taken in a given state, but only after a reduce where the shift counter has decremented to zero (indicating that the current state should handle the result). This is the goto action, which is essentially a special case of shift designed to handle non-terminals in a production. This action must be handled ''after'' the multi-branch statement, since this is where any reduction results will "resurface" from farther down the call stack.


Example

Consider the following grammar in
bison Bison are large bovines in the genus ''Bison'' (Greek: "wild ox" (bison)) within the tribe Bovini. Two extant and numerous extinct species are recognised. Of the two surviving species, the American bison, ''B. bison'', found only in North ...
syntax:
expr : expr '+' term   
     ,  expr '-' term   
     ,  term            
     ;

term : '(' expr ')'    
     ,  num             
     ;

num : '0'              
    ,  '1'              
    ;
This grammar is LR(0) in that it is left-recursive (in the expr non-terminal) but does not require any lookahead. Recursive ascent is also capable of handling grammars which are LALR(1) in much the same way that table-driven parsers handle such cases (by pre-computing conflict resolutions based on possible lookahead). The following is a Scala implementation of a recursive ascent parser based on the above grammar: object ExprParser The following is a
Prolog Prolog is a logic programming language associated with artificial intelligence and computational linguistics. Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is intended primarily a ...
implementation of a recursive ascent parser based on the above grammar: state(S), --> state(S0, S), --> 0 /* 0. S --> E$ 1. E --> E + T 2. E --> E - T 3. E --> T 4. T --> (E) 5. T --> N 6. N --> 0 7. N --> 1 */ accept --> state(s([], [e(_)])). r(1) --> state(s(Ts, [t(A1), '+', e(A0), Ss]), s(Ts, [e(A0+A1), Ss])). r(2) --> state(s(Ts, [t(A1), '-', e(A0), Ss]), s(Ts, [e(A0-A1), Ss])). r(3) --> state(s(Ts, Ss, s(Ts, [e(A), Ss])). r(4) --> state(s(Ts, Ss, s(Ts, Ss). r(5) --> state(s(Ts, Ss, s(Ts, Ss). r(6) --> state(s(Ts, Ss, s(Ts, Ss). r(7) --> state(s(Ts, Ss, s(Ts, Ss). t(T) --> state(s( Ts Ss), s(Ts, Ss). /* S --> .E$ E --> .E + T E --> .E - T E --> .T T --> .(E) T --> .N N --> .0 N --> .1 */ s0 --> t('('), s3, s2, s1. s0 --> t('0'), s11, s10, s2, s1. s0 --> t('1'), s12, s10, s2, s1. /* S --> E.$ E --> E. + T E --> E. - T */ s1 --> accept. s1 --> t('+'), s7, s1. s1 --> t('-'), s8, s1. /* E --> T. */ s2 --> r(3). /* T --> (.E) E --> .E + T E --> .E - T E --> .T T --> .(E) T --> .N N --> .0 N --> .1 */ s3 --> t('('), s3, s2, s4. s3 --> t('0'), s11, s10, s2, s4. s3 --> t('1'), s12, s10, s2, s4. /* T --> (E.) E --> E .+ T E --> E .- T */ s4 --> t(')'), s9. s4 --> t('+'), s7, s4. s4 --> t('-'), s8, s4. /* E --> E + T. */ s5 --> r(1). /* E --> E - T. */ s6 --> r(2). /* E --> E + .T T --> .(E) T --> .N N --> .0 N --> .1 */ s7 --> t('('), s3, s5. s7 --> t('0'), s11, s10, s5. s7 --> t('1'), s12, s10, s5. /* E --> E - .T T --> .(E) T --> .N N --> .0 N --> .1 */ s8 --> t('('), s3, s6. s8 --> t('0'), s11, s10, s6. s8 --> t('1'), s12, s10, s6. /* T --> (E). */ s9 --> r(4). /* T --> N. */ s10 --> r(5). /* N --> '0'. */ s11 --> r(6). /* N --> '1'. */ s12 --> r(7). parser(Cs, T) :- length(Cs, _), phrase(s0, (Cs, [">html" ;"title="(Cs, [">(Cs, [ [s([">quot;>(Cs,_[<_a>.html" ;"title="html" ;"title="(Cs, [">(Cs, [">html" ;"title="(Cs, [">(Cs, [ [s([ [e(T)])]). % state(S0, S), --> [S0, S]. %?- S0 = [s("(1+1)", []), _], phrase(s0, S0, [S]), maplist(portray_clause, S0).


See also

* Recursive descent parser


References

{{Parsers Parsing algorithms