The Parser Grammar Engine (PGE, originally the Parrot Grammar Engine) is a
compiler
In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
and
runtime system
In computer programming, a runtime system or runtime environment is a sub-system that exists in the computer where a program is created, as well as in the computers where the program is intended to be run. The name comes from the compile time ...
for
Raku rules for the
Parrot virtual machine
Parrot is a discontinued register-based process virtual machine designed to run dynamic languages efficiently. It is possible to compile Parrot assembly language and Parrot intermediate representation (PIR, an intermediate language) to Parr ...
. PGE uses these ''rules'' to convert a
parsing expression grammar
In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 20 ...
into Parrot
bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normal ...
. It is therefore compiling rules into a program, unlike most virtual machines and runtimes, which store regular expressions in a secondary internal format that is then interpreted at runtime by a regular expression engine. The rules format used by PGE can express any
regular expression
A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
and most
formal grammar
A formal grammar is a set of Terminal and nonterminal symbols, symbols and the Production (computer science), production rules for rewriting some of them into every possible string of a formal language over an Alphabet (formal languages), alphabe ...
s, and as such it forms the first link in the compiler chain for all of Parrot's front-end languages.
When executed, the bytecode generated by PGE will parse text as described in the input rules, generating a parse tree. The parse tree can be manipulated directly, or fed into the next stage of the ''Parrot compiler toolchain'' to generate an
abstract syntax tree
An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text (often source code) written in a formal ...
(AST) from which code can be generated; if the grammar describes a programming language.
History
Originally named ''P6GE'' and written in C, PGE was translated to native Parrot and renamed not long after its initial release in November 2004. Its author is Patrick R. Michaud. PGE was written to reduce the amount of work needed to implement a compiler on Parrot. It was also written to allow Perl 6 to easily self-host, though current
Pugs development no longer uses PGE as its main rules back-end in favor of a native engine named PCR.
Internals
PGE combines three styles of parsing:
* Raku rules
* an
operator-precedence parser
* custom parse subroutines
The primary form is Raku rules, so a PGE rule might look like this for an addition-only grammar:
rule term
rule number
rule expr
The operator precedence parser allows an operator table to be built and used directly in a Perl 6 rule style parser like so:
rule expr is optable
rule term
rule number
proto term: is precedence('=')
is parsed(&term)
proto infix:+ is looser('term:')
This accomplishes the same goal of defining a simple, addition-only grammar, but does so using a combination of a Raku style regex/rules for
term
and
number
and a shift-reduce optable for everything else.
Code generation
Though PGE outputs code which will parse the grammar described by a rule, and can be used at
runtime to handle simple grammars and regular expressions found in code, its main purpose is to parse
high-level programming language
A high-level programming language is a programming language with strong Abstraction (computer science), abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language ''elements'', be ea ...
s.
The Parrot compiler toolchain is broken into several parts, of which PGE is the first. PGE converts source code to
parse tree
A parse tree or parsing tree (also known as a derivation tree or concrete syntax tree) is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is use ...
s. The ''tree grammar engine'' (TGE) then converts these into a ''Parrot abstract syntax trees'' (PAST). A second TGE pass then converts a PAST into ''Parrot
opcode
In computing, an opcode (abbreviated from operation code) is an enumerated value that specifies the operation to be performed. Opcodes are employed in hardware devices such as arithmetic logic units (ALUs), central processing units (CPUs), and ...
syntax trees'' (POST) which can be directly transformed into executable bytecode.
References
External links
*{{cite web , url=http://www.pmichaud.com/2006/pres/yapc-parsers/start.html , title=Parsers, Perl 6 Rules, and the Parrot Grammar Engine , date=2006-06-28
*
Perl
Formal languages
Pattern matching
Beta software
Compilers
Interpreters (computing)