HOME

TheInfoList



OR:

Raku rules are the
regular expression A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
, string matching and general-purpose
parsing Parsing, syntax analysis, or syntactic analysis is a process of analyzing a String (computer science), string of Symbol (formal), symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal gramm ...
facility of the Raku programming language, and are a core part of the language. Since
Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". Perl was developed ...
's pattern-matching constructs have exceeded the capabilities of
formal Formal, formality, informal or informality imply the complying with, or not complying with, some set of requirements ( forms, in Ancient Greek). They may refer to: Dress code and events * Formal wear, attire for formal events * Semi-formal atti ...
regular expressions for some time, Raku documentation refers to them exclusively as ''regexes'', distancing the term from the formal definition. Raku provides a superset of Perl 5 features with respect to regexes, folding them into a larger framework called ''rules'', which provide the capabilities of a
parsing expression grammar In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 20 ...
, as well as acting as a closure with respect to their lexical scope. Rules are introduced with the rule keyword, which has a usage quite similar to subroutine definitions. Anonymous rules can be introduced with the regex (or rx) keyword, or simply be used inline as regexes were in Perl 5 via the m (matching) or s (substitution) operators.


History

In ''Apocalypse 5'', a document outlining the preliminary design decisions for Raku pattern matching, Larry Wall enumerated 20 problems with the "current regex culture". Among these were that Perl's regexes were "too compact and 'cute'", had "too much reliance on too few metacharacters", "little support for named captures", "little support for grammars", and "poor integration with 'real' language". Between late 2004 and mid-2005, a compiler for Raku style rules was developed for the
Parrot virtual machine Parrot is a discontinued register-based process virtual machine designed to run dynamic languages efficiently. It is possible to compile Parrot assembly language and Parrot intermediate representation (PIR, an intermediate language) to Parr ...
called Parrot Grammar Engine (PGE), which was later renamed to the more generic
Parser Grammar Engine The Parser Grammar Engine (PGE, originally the Parrot Grammar Engine) is a compiler and runtime system for Raku rules for the Parrot virtual machine. PGE uses these ''rules'' to convert a parsing expression grammar into Parrot bytecode. It is ther ...
. PGE is a combination of runtime and compiler for Raku style grammars that allows any parrot-based compiler to use these tools for parsing, and also to provide rules to their runtimes. Among other Raku features, support for named captures was added to Perl 5.10 in 2007. In May 2012, the reference implementation of Raku,
Rakudo Rakudo is a Raku compiler targeting MoarVM, and the Java Virtual Machine, that implements the Raku specification. It is currently the only major Raku compiler in active development. Originally developed within the Parrot Parrots (Psittacif ...
, shipped its Rakudo Star monthly snapshot with a working
JSON JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
parser built entirely in Raku rules.


Changes from Perl 5

There are only six unchanged features from Perl 5's regexes: * Literals: word characters (letters, numbers and
underscore An underscore or underline is a line drawn under a segment of text. In proofreading, underscoring is a convention that says "set this text in italic type", traditionally used on manuscript or typescript as an instruction to the printer. Its ...
) matched literally * Capturing: (...) * Alternatives: , * Backslash escape: \ * Repetition quantifiers: *, +, and ?, but not * Minimal matching suffix: *?, +?, ?? A few of the most powerful additions include: * The ability to reference rules using to build up entire grammars. * A handful of commit operators that allow the programmer to control
backtracking Backtracking is a class of algorithms for finding solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it de ...
during matching. The following changes greatly improve the readability of regexes: * Simplified non-capturing groups: ../code>, which are the same as Perl 5's: (?:...) * Simplified code assertions: * Allows for whitespace to be included without being matched, allowing for multiline regexes. Use \ or ' ' to express whitespace. * Extended regex formatting (Perl 5's /x) is now the default.


Implicit changes

Some of the features of Perl 5 regular expressions are more powerful in Raku because of their ability to encapsulate the expanded features of Raku rules. For example, in Perl 5, there were positive and negative lookahead operators (?=...) and (?!...). In Raku these same features exist, but are called and . However, because before can encapsulate arbitrary rules, it can be used to express lookahead as a
syntactic predicate A syntactic predicate specifies the syntactic validity of applying a production in a formal grammar and is analogous to a semantic predicate that specifies the semantic validity of applying a production. It is a simple and effective means of dram ...
for a grammar. For example, the following
parsing expression grammar In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 20 ...
describes the classic non-context-free language \ : S ← &(A !b) a+ B A ← a A? b B ← b B? c In Raku rules that would be: rule S rule A rule B Of course, given the ability to mix rules and regular code, that can be simplified even further: rule S However, this makes use of assertions, which is a subtly different concept in Raku rules, but more substantially different in parsing theory, making this a semantic rather than syntactic predicate. The most important difference in practice is performance. There is no way for the rule engine to know what conditions the assertion may match, so no optimization of this process can be made.


Integration with Perl

In many languages, regular expressions are entered as strings, which are then passed to library routines that parse and compile them into an internal state. In Perl 5, regular expressions shared some of the
lexical analysis Lexical tokenization is conversion of a text into (semantically or syntactically) meaningful ''lexical tokens'' belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives ...
with Perl's scanner. This simplified many aspects of regular expression usage, though it added a great deal of complexity to the scanner. In Raku, rules are part of the grammar of the language. No separate parser exists for rules, as it did in Perl 5. This means that code, embedded in rules, is parsed at the same time as the rule itself and its surrounding code. For example, it is possible to nest rules and code without re-invoking the parser: rule ab The above is a single block of Raku code that contains an outer rule definition, an inner block of assertion code, and inside of that a regex that contains one more level of assertion.


Implementation


Keywords

There are several keywords used in conjunction with Raku rules: ;regex: A named or anonymous regex that ignores whitespace within the regex by default. ;token: A named or anonymous regex that implies the :ratchet modifier. ;rule: A named or anonymous regex that implies the :ratchet and :sigspace modifiers. ;rx: An anonymous regex that takes arbitrary delimiters such as // where regex only takes braces. ;m: An operator form of anonymous regex that performs matches with arbitrary delimiters. ;mm: Shorthand for m with the :sigspace modifier. ;s: An operator form of anonymous regex that performs substitution with arbitrary delimiters. ;ss: Shorthand for s with the :sigspace modifier. ;/.../: Simply placing a regex between slashes is shorthand for rx/.../. Here is an example of typical use: token word rule phrase if $string ~~ / \n /


Modifiers

Modifiers may be placed after any of the regex keywords, and before the delimiter. If a regex is named, the modifier comes after the name. Modifiers control the way regexes are parsed and how they behave. They are always introduced with a leading : character. Some of the more important modifiers include: * :i or :ignorecase – Perform matching without respect to case. * :m or :ignoremark – Perform matching without respect to combining characters. * :g or :global – Perform the match more than once on a given target string. * :s or :sigspace – Replace whitespace in the regex with a whitespace-matching rule, rather than simply ignoring it. * :Perl5 – Treat the regex as a Perl 5 regular expression. * :ratchet – Never perform backtracking in the rule. For example: regex addition


Grammars

A grammar may be defined using the grammar operator. A grammar is essentially just a
namespace In computing, a namespace is a set of signs (''names'') that are used to identify and refer to objects of various kinds. A namespace ensures that all of a given set of objects have unique names so that they can be easily identified. Namespaces ...
for rules: grammar Str::SprintfFormat This is the grammar used to define Perl's sprintf string formatting notation. Outside of this namespace, you could use these rules like so: if / / A rule used in this way is actually identical to the invocation of a subroutine with the extra semantics and side-effects of pattern matching (e.g., rule invocations can be backtracked).


Examples

Here are some example rules in Raku: rx rx That last is identical to: rx


References


External links


Raku Grammars
- The reference manual page for grammars.
Grammar tutorial
- A tutorial for grammars in Raku

- The standards document covering Perl 6 regexes and rules.
Perl 6 Regex Introduction
- Gentle introduction to Perl 6 regexes. {{Perl Raku (programming language) Regular expressions