The Parser Grammar Engine (PGE, originally the Parrot Grammar Engine) is a
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
and
runtime for
Raku rules
Raku may refer to:
* Lake Raku, an artificial lake in Tallinn, Estonia
* Raku ware, a type of pottery used in the Japanese tea ceremony
* Raku, Nepal, a village in the Karnali Zone
* ''RAkU'' (ballet), a ballet by Yuri Possokhov
* Raku (progra ...
for the
Parrot virtual machine
Parrot was a register-based process virtual machine designed to run dynamic languages efficiently. It is possible to compile Parrot assembly language and Parrot intermediate representation (PIR, an intermediate language) to Parrot bytecode ...
. PGE uses these ''rules'' to convert a
parsing expression grammar
In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in ...
into Parrot
bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
. It is therefore compiling rules into a program, unlike most virtual machines and runtimes, which store regular expressions in a secondary internal format that is then interpreted at runtime by a regular expression engine. The rules format used by PGE can express any
regular expression
A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
and most
formal grammar
In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
s, and as such it forms the first link in the compiler chain for all of Parrot's front-end languages.
When executed, the bytecode generated by PGE will parse text as described in the input rules, generating a parse tree. The parse tree can be manipulated directly, or fed into the next stage of the
Parrot compiler toolchain in order to generate an
AST from which code generation can occur (if the grammar describes a programming language).
History
Originally named ''P6GE'' and written in C, PGE was translated to native Parrot and renamed not long after its initial release in November 2004. Its author is Patrick R. Michaud. PGE was written in order to reduce the amount of work required to implement a compiler on top of Parrot. It was also written to allow Perl 6 to easily self-host, though current
Pugs
The Pug is a breed of dog originally from China, with physically distinctive features of a wrinkly, short-muzzled face and curled tail. The breed has a fine, glossy coat that comes in a variety of colors, most often light brown (fawn) or blac ...
development no longer uses PGE as its primary rules back-end in favor of a native engine called PCR.
Internals
PGE combines three styles of parsing:
* Raku rules
* an
operator precedence parser
* custom parse subroutines
The primary form is Raku rules, so a PGE rule might look like this for an addition-only grammar:
rule term
rule number
rule expr
The operator precedence parser allows an operator table to be built and used directly in a Perl 6 rule style parser like so:
rule expr is optable
rule term
rule number
proto term: is precedence('=')
is parsed(&term)
proto infix:+ is looser('term:')
This accomplishes the same goal of defining a simple, addition-only grammar, but does so using a combination of a Raku style regex/rules for
term
and
number
and a shift-reduce optable for everything else.
Code generation
Though PGE outputs code which will parse the grammar described by a rule, and can be used at
run time to handle simple grammars and regular expressions found in code, its primary purpose is for the parsing of
high level language
In computer science, a high-level programming language is a programming language with strong abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language ''elements'', be easier to use, ...
s.
The Parrot compiler toolchain is broken into several parts, of which PGE is the first. PGE converts source code to
parse tree
A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is used primarily in comp ...
s. The
Tree Grammar Engine (TGE) then converts these into Parrot Abstract Syntax Trees (PAST). A second TGE pass then converts a PAST into
Parrot Opcode Syntax Trees (POST) which can be directly transformed into executable bytecode.
References
External links
*{{cite web , url=http://www.pmichaud.com/2006/pres/yapc-parsers/start.html , title=Parsers, Perl 6 Rules, and the Parrot Grammar Engine , date=2006-06-28
*
Perl
Formal languages
Pattern matching
Beta software
Compilers
Interpreters (computing)