RE2 is a software
library
A library is a collection of Book, books, and possibly other Document, materials and Media (communication), media, that is accessible for use by its members and members of allied institutions. Libraries provide physical (hard copies) or electron ...
which implements a
regular expression
A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
engine. It uses
finite-state machines
A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
, in contrast to most other
regular expression libraries. RE2 supports a
C++ interface.
RE2 was implemented by
Google
Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
and Google uses RE2 for Google products. RE2 uses an "on-the-fly"
deterministic finite-state automaton algorithm based on
Ken Thompson
Kenneth Lane Thompson (born February 4, 1943) is an American pioneer of computer science. Thompson worked at Bell Labs for most of his career where he designed and implemented the original Unix operating system. He also invented the B (programmi ...
's Plan 9
grep
grep is a command-line utility for searching plaintext datasets for lines that match a regular expression. Its name comes from the ed command g/re/p (global regular expression search and print), which has the same effect. grep was originally de ...
.
Comparison to PCRE
RE2 performs comparably to
Perl Compatible Regular Expressions
Perl Compatible Regular Expressions (PCRE) is a library written in C, which implements a regular expression engine, inspired by the capabilities of the Perl programming language. Philip Hazel started writing PCRE in summer 1997. PCRE's syntax ...
(PCRE). For certain regular expression operators like
,
(the operator for
alternation or
logical disjunction
In logic, disjunction (also known as logical disjunction, logical or, logical addition, or inclusive disjunction) is a logical connective typically notated as \lor and read aloud as "or". For instance, the English language sentence "it is ...
) it is superior to PCRE. Unlike PCRE, which supports features such as
lookarounds, backreferences and recursion, RE2 is only able to recognize
regular languages
In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...
due to its construction using the Thompson DFA
algorithm. It is also slightly slower than PCRE for parenthetic capturing operations.
PCRE can use a large
recursive stack with corresponding high memory usage and result in
exponential
Exponential may refer to any of several mathematical topics related to exponentiation, including:
* Exponential function, also:
**Matrix exponential, the matrix analogue to the above
*Exponential decay, decrease at a rate proportional to value
* Ex ...
runtime on certain patterns. In contrast, RE2 uses a fixed stack size and guarantees that its runtime increases
linearly (not exponentially) with the size of the input. The maximum memory allocated with RE2 is configurable. This can make it more suitable for use in server applications, which require boundaries on memory usage and computational time.
Adoption
Use in Google products
RE2 is available to users of
Google Docs
Google Docs is an online word processor and part of the free, web-based Google Docs Editors suite offered by Google. Google Docs is accessible via a web browser as a web-based application and is also available as a mobile app on Android and iO ...
and
Google Sheets
Google Sheets is a spreadsheet application and part of the free, web-based Google Docs Editors suite offered by Google. Google Sheets is available as a web application; a mobile app for: Android, iOS, and as a desktop application on Googl ...
. Google Sheets supports RE2 except Unicode character class matching. RegexExtract does not us
grouping
Use in Go
The built-in "regexp" package uses the same patterns and implementation as RE2, though it is written in Go. This is unsurprising, given Go's common staff from the
Plan 9 team.
Related libraries
The RE2 algorithm has been rewritten in
Rust
Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH) ...
as the package "regex".
CloudFlare
Cloudflare, Inc., is an American company that provides content delivery network services, cybersecurity, DDoS mitigation, wide area network services, reverse proxies, Domain Name Service, ICANN-accredited domain registration, and other se ...
's
web application firewall
A web application firewall (WAF) is a specific form of application firewall that filters, monitors, and blocks HTTP traffic to and from a web service. By inspecting HTTP traffic, it can prevent attacks exploiting a web application's known vulne ...
uses this package because the RE2 algorithm is immune to
ReDoS
A regular expression denial of service (ReDoS)
is an algorithmic complexity attack that produces a denial-of-service by providing a regular expression and/or an input that takes a long time to evaluate. The attack exploits the fact that many re ...
.
Russ Cox also wrote RE1, an earlier regular expression based on a bytecode interpreter.
OpenResty
OpenResty is an nginx distribution which includes the LuaJIT interpreter for Lua scripts. The software was created by Yichun Zhang. It was originally sponsored by Taobao before 2011 and was mainly supported by Cloudflare from 2012 to 2016. Since ...
uses a RE1 fork called "sregex".
See also
*
Comparison of regular expression engines
This is a comparison of regular expression engines.
Libraries
Languages
{, class="wikitable sortable" style="width: auto; table-layout: fixed;"
, + List of languages and frameworks including regular expression support
, -
! Language
! Offici ...
References
{{Reflist
Regular expressions
Software using the BSD license