HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some
pattern A pattern is a regularity in the world, in human-made design, or in abstract ideas. As such, the elements of a pattern repeat in a predictable manner. A geometric pattern is a kind of pattern formed of geometric shapes and typically repeated l ...
. In contrast to
pattern recognition Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess PR capabilities but their p ...
, the match usually must be exact: "either it will or will not be a match." The patterns generally have the form of either
sequences In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is call ...
or
tree structure A tree structure, tree diagram, or tree model is a way of representing the hierarchical nature of a structure in a graphical form. It is named a "tree structure" because the classic representation resembles a tree, although the chart is gen ...
s. Uses of pattern matching include outputting the locations (if any) of a pattern within a token sequence, to output some component of the matched pattern, and to substitute the matching pattern with some other token sequence (i.e., search and replace). Sequence patterns (e.g., a text string) are often described using
regular expression A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
s and matched using techniques such as
backtracking Backtracking is a class of algorithms for finding solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it de ...
. Tree patterns are used in some
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
s as a general tool to process data based on its structure, e.g. C#, F#,
Haskell Haskell () is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell pioneered several programming language ...
,
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
, ML, Python,
Ruby Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
,
Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH) ...
, Scala,
Swift Swift or SWIFT most commonly refers to: * SWIFT, an international organization facilitating transactions between banks ** SWIFT code * Swift (programming language) * Swift (bird), a family of birds It may also refer to: Organizations * SWIF ...
and the symbolic mathematics language
Mathematica Wolfram (previously known as Mathematica and Wolfram Mathematica) is a software system with built-in libraries for several areas of technical computing that allows machine learning, statistics, symbolic computation, data manipulation, network ...
have special
syntax In linguistics, syntax ( ) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituenc ...
for expressing tree patterns and a
language construct In computer programming, a language construct is "a syntactically allowable part of a program that may be formed from one or more lexical tokens in accordance with the rules of the programming language", as defined by in the ISO/IEC 2382 stan ...
for conditional execution and value retrieval based on it. Often it is possible to give alternative patterns that are tried one by one, which yields a powerful conditional programming construct. Pattern matching sometimes includes support for guards.


History

Early programming languages with pattern matching constructs include
COMIT COMIT was the first string processing language (compare SNOBOL, TRAC, and Perl), developed on the IBM 700/7000 series computers by Victor Yngve, University of Chicago, and collaborators at MIT from 1957 to 1965. Yngve created the language f ...
(1957),
SNOBOL SNOBOL ("StriNg Oriented and symBOlic Language") is a series of programming languages developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph Griswold and Ivan P. Polonsky, culminating in SNOBOL4. It was one of a ...
(1962),
Refal Refal ("Recursive functions algorithmic language"; ) "is a functional programming language oriented toward symbolic computations", including " string processing, language translation, ndartificial intelligence". It is one of the oldest members ...
(1968) with tree-based pattern matching,
Prolog Prolog is a logic programming language that has its origins in artificial intelligence, automated theorem proving, and computational linguistics. Prolog has its roots in first-order logic, a formal logic. Unlike many other programming language ...
(1972), St Andrews Static Language ( SASL) (1976), NPL (1977), and
Kent Recursive Calculator KRC (Kent Recursive Calculator) is a lazy functional language developed by David Turner from November 1979 to October 1981 based on SASL, with pattern matching, guards and ZF expressions (now more usually called list comprehensions). Two i ...
(KRC) (1981). The pattern matching feature of function arguments in the language ML (1973) and its dialect
Standard ML Standard ML (SML) is a General-purpose programming language, general-purpose, High-level programming language, high-level, Modular programming, modular, Functional programming, functional programming language with compile-time type checking and t ...
(1983) has been carried over to some other
functional programming In computer science, functional programming is a programming paradigm where programs are constructed by Function application, applying and Function composition (computer science), composing Function (computer science), functions. It is a declarat ...
languages that were influenced by them, such as
Haskell Haskell () is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell pioneered several programming language ...
(1990), Scala (2004), and F# (2005). The pattern matching construct with the match keyword that was introduced in the ML dialect
Caml Caml (originally an acronym for Categorical Abstract Machine Language) is a multi-paradigm, general-purpose, high-level, functional programming language which is a dialect of the ML programming language family. Caml was developed in France ...
(1985) was followed by languages such as
OCaml OCaml ( , formerly Objective Caml) is a General-purpose programming language, general-purpose, High-level programming language, high-level, Comparison of multi-paradigm programming languages, multi-paradigm programming language which extends the ...
(1996), F# (2005), F* (2011), and
Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH) ...
(2015). Many
text editor A text editor is a type of computer program that edits plain text. An example of such program is "notepad" software (e.g. Windows Notepad). Text editors are provided with operating systems and software development packages, and can be used to c ...
s support pattern matching of various kinds: the QED editor supports
regular expression A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
search, and some versions of TECO support the OR operator in searches.
Computer algebra system A computer algebra system (CAS) or symbolic algebra system (SAS) is any mathematical software with the ability to manipulate mathematical expressions in a way similar to the traditional manual computations of mathematicians and scientists. The de ...
s generally support pattern matching on algebraic expressions.


Types


Primitive patterns

The simplest pattern in pattern matching is an explicit value or a variable. For an example, consider a simple function definition in Haskell syntax (function parameters are not in parentheses but are separated by spaces, = is not assignment but definition): f 0 = 1 Here, 0 is a single value pattern. Now, whenever f is given 0 as argument the pattern matches and the function returns 1. With any other argument, the matching and thus the function fail. As the syntax supports alternative patterns in function definitions, we can continue the definition extending it to take more generic arguments: f n = n * f (n-1) Here, the first n is a single variable pattern, which will match absolutely any argument and bind it to name n to be used in the rest of the definition. In Haskell (unlike at least
Hope Hope is an optimistic state of mind that is based on an expectation of positive outcomes with respect to events and circumstances in one's own life, or the world at large. As a verb, Merriam-Webster defines ''hope'' as "to expect with confid ...
), patterns are tried in order so the first definition still applies in the very specific case of the input being 0, while for any other argument the function returns n * f (n-1) with n being the argument. The wildcard pattern (often written as _) is also simple: like a variable name, it matches any value, but does not bind the value to any name. Algorithms for
matching wildcards In computer science, an algorithm for matching wildcards (also known as globbing) is useful in comparing text strings that may contain wildcard character, wildcard syntax. Common uses of these algorithms include command-line interfaces, e.g. the Bo ...
in simple string-matching situations have been developed in a number of
recursive Recursion occurs when the definition of a concept or process depends on a simpler or previous version of itself. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in m ...
and non-recursive varieties.


Tree patterns

More complex patterns can be built from the primitive ones of the previous section, usually in the same way as values are built by combining other values. The difference then is that with variable and wildcard parts, a pattern does not build into a single value, but matches a group of values that are the combination of the concrete elements and the elements that are allowed to vary within the structure of the pattern. A tree pattern describes a part of a tree by starting with a node and specifying some branches and nodes and leaving some unspecified with a variable or wildcard pattern. It may help to think of the
abstract syntax tree An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text (often source code) written in a formal ...
of a programming language and
algebraic data type In computer programming, especially functional programming and type theory, an algebraic data type (ADT) is a kind of composite data type, i.e., a data type formed by combining other types. Two common classes of algebraic types are product ty ...
s.


Haskell

In Haskell, the following line defines an algebraic data type Color that has a single data constructor ColorConstructor that wraps an integer and a string. data Color = ColorConstructor Integer String The constructor is a node in a tree and the integer and string are leaves in branches. When we want to write functions to make Color an
abstract data type In computer science, an abstract data type (ADT) is a mathematical model for data types, defined by its behavior (semantics) from the point of view of a '' user'' of the data, specifically in terms of possible values, possible operations on data ...
, we wish to write functions to interface with the data type, and thus we want to extract some data from the data type, for example, just the string or just the integer part of Color. If we pass a variable that is of type Color, how can we get the data out of this variable? For example, for a function to get the integer part of Color, we can use a simple tree pattern and write: integerPart (ColorConstructor theInteger _) = theInteger As well: stringPart (ColorConstructor _ theString) = theString The creations of these functions can be automated by Haskell's data record syntax.


OCaml

This
OCaml OCaml ( , formerly Objective Caml) is a General-purpose programming language, general-purpose, High-level programming language, high-level, Comparison of multi-paradigm programming languages, multi-paradigm programming language which extends the ...
example which defines a red–black tree and a function to re-balance it after element insertion shows how to match on a more complex structure generated by a recursive data type. The compiler verifies at compile-time that the list of cases is exhaustive and none are redundant. type color = Red , Black type 'a tree = Empty , Tree of color * 'a tree * 'a * 'a tree let rebalance t = match t with , Tree (Black, Tree (Red, Tree (Red, a, x, b), y, c), z, d) , Tree (Black, Tree (Red, a, x, Tree (Red, b, y, c)), z, d) , Tree (Black, a, x, Tree (Red, Tree (Red, b, y, c), z, d)) , Tree (Black, a, x, Tree (Red, b, y, Tree (Red, c, z, d))) -> Tree (Red, Tree (Black, a, x, b), y, Tree (Black, c, z, d)) , _ -> t (* the 'catch-all' case if no previous pattern matches *)


Usage


Filtering data with patterns

Pattern matching can be used to filter data of a certain structure. For instance, in Haskell a
list comprehension A list comprehension is a syntactic construct available in some programming languages for creating a list based on existing lists. It follows the form of the mathematical '' set-builder notation'' (''set comprehension'') as distinct from the use o ...
could be used for this kind of filtering: A x <- 1, B 1, A 2, B 2 evaluates to [A 1, A 2


Pattern matching in Mathematica

In Mathematica, the only structure that exists is the Tree (data structure), tree, which is populated by symbols. In the
Haskell Haskell () is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell pioneered several programming language ...
syntax used thus far, this could be defined as data SymbolTree = Symbol String ymbolTree An example tree could then look like Symbol "a" ymbol "b" [ Symbol "c" [">html" ;"title="ymbol "b" [">ymbol "b" [ Symbol "c" [ In the traditional, more suitable syntax, the symbols are written as they are and the levels of the tree are represented using [], so that for instance a[b,c] is a tree with a as the parent, and b and c as the children. A pattern in Mathematica involves putting "_" at positions in that tree. For instance, the pattern A will match elements such as A A or more generally A 'x''where ''x'' is any entity. In this case, A is the concrete element, while _ denotes the piece of tree that can be varied. A symbol prepended to _ binds the match to that variable name while a symbol appended to _ restricts the matches to nodes of that symbol. Note that even blanks themselves are internally represented as Blank[] for _ and Blank[x] for _x. The Mathematica function Cases filters elements of the first argument that match the pattern in the second argument: Cases a[_">.html" ;"title=" a[_"> a[_ evaluates to Pattern matching applies to the ''structure'' of expressions. In the example below, Cases[ , a[b[_], _] ] returns because only these elements will match the pattern a[b[_],_] above. In Mathematica, it is also possible to extract structures as they are created in the course of computation, regardless of how or where they appear. The function Trace can be used to monitor a computation, and return the elements that arise which match a pattern. For example, we can define the
Fibonacci sequence In mathematics, the Fibonacci sequence is a Integer sequence, sequence in which each element is the sum of the two elements that precede it. Numbers that are part of the Fibonacci sequence are known as Fibonacci numbers, commonly denoted . Many w ...
as fib 1=1 fib _= fib -1+ fib -2 Then, we can ask the question: Given fib what is the sequence of recursive Fibonacci calls? Trace fib ib fib[_ returns a structure that represents the occurrences of the pattern fib[_">">ib fib[_ returns a structure that represents the occurrences of the pattern fib[_/code> in the computational structure:


Declarative programming

In symbolic programming languages, it is easy to have patterns as arguments to functions or as elements of data structures. A consequence of this is the ability to use patterns to declaratively make statements about pieces of data and to flexibly instruct functions how to operate. For instance, the Mathematica function Compile can be used to make more efficient versions of the code. In the following example the details do not particularly matter; what matters is that the subexpression instructs Compile that expressions of the form com /code> can be assumed to be
integer An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
s for the purposes of compilation: com _:= Binomial i, iCompile x^com[i ">.html" ;"title=" x^com x^com[i Mailboxes in Erlang (programming language)">Erlang also work this way. The Curry–Howard correspondence">"> x^com[i ">.html" ;"title=" x^com[i"> x^com[i Mailboxes in Erlang (programming language)">Erlang also work this way. The Curry–Howard correspondence between proofs and programs relates ML-style pattern matching to case analysis and proof by exhaustion">Proof by cases">case analysis and proof by exhaustion.


Pattern matching and strings

By far the most common form of pattern matching involves strings of characters. In many programming languages, a particular syntax of strings is used to represent regular expressions, which are patterns describing string characters. However, it is possible to perform some string pattern matching within the same framework that has been discussed throughout this article.


Tree patterns for strings

In Mathematica, strings are represented as trees of root StringExpression and all the characters in order as children of the root. Thus, to match "any amount of trailing characters", a new wildcard ___ is needed in contrast to _ that would match only a single character. In Haskell and
functional programming In computer science, functional programming is a programming paradigm where programs are constructed by Function application, applying and Function composition (computer science), composing Function (computer science), functions. It is a declarat ...
languages in general, strings are represented as functional lists of characters. A functional list is defined as an empty list, or an element constructed on an existing list. In Haskell syntax: [">List (computing)">lists of characters. A functional list is defined as an empty list, or an element constructed on an existing list. In Haskell syntax: [-- an empty list x:xs -- an element x constructed on a list xs The structure for a list with some elements is thus element:list. When pattern matching, we assert that a certain piece of data is equal to a certain pattern. For example, in the function: head (element:list) = element We assert that the first element of head's argument is called element, and the function returns this. We know that this is the first element because of the way lists are defined, a single element constructed onto a list. This single element must be the first. The empty list would not match the pattern at all, as an empty list does not have a head (the first element that is constructed). In the example, we have no use for list, so we can disregard it, and thus write the function: head (element:_) = element The equivalent Mathematica transformation is expressed as head[element, ]:=element


Example string patterns

In Mathematica, for instance, StringExpression["a",_] will match a string that has two characters and begins with "a". The same pattern in Haskell: a', _ Symbolic entities can be introduced to represent many different classes of relevant features of a string. For instance, StringExpression etterCharacter, DigitCharacter will match a string that consists of a letter first, and then a number. In Haskell, guards could be used to achieve the same matches: etter, digit, isAlpha letter && isDigit digit The main advantage of symbolic string manipulation is that it can be completely integrated with the rest of the programming language, rather than being a separate, special purpose subunit. The entire power of the language can be leveraged to build up the patterns themselves or analyze and transform the programs that contain them.


SNOBOL

SNOBOL (''StriNg Oriented and symBOlic Language'') is a computer programming language developed between 1962 and 1967 at
AT&T AT&T Inc., an abbreviation for its predecessor's former name, the American Telephone and Telegraph Company, is an American multinational telecommunications holding company headquartered at Whitacre Tower in Downtown Dallas, Texas. It is the w ...
Bell Laboratories Nokia Bell Labs, commonly referred to as ''Bell Labs'', is an American industrial research and development company owned by Finnish technology company Nokia. With headquarters located in Murray Hill, New Jersey, the company operates several lab ...
by David J. Farber, Ralph E. Griswold and Ivan P. Polonsky. SNOBOL4 stands apart from most programming languages by having patterns as a first-class data type (''i.e.'' a data type whose values can be manipulated in all ways permitted to any other data type in the programming language) and by providing operators for pattern
concatenation In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalizations of concatenati ...
and alternation. Strings generated during execution can be treated as programs and executed. SNOBOL was quite widely taught in larger US universities in the late 1960s and early 1970s and was widely used in the 1970s and 1980s as a text manipulation language in the
humanities Humanities are academic disciplines that study aspects of human society and culture, including Philosophy, certain fundamental questions asked by humans. During the Renaissance, the term "humanities" referred to the study of classical literature a ...
. Since SNOBOL's creation, newer languages such as AWK and
Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". Perl was developed ...
have made string manipulation by means of
regular expression A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
s fashionable. SNOBOL4 patterns, however, subsume
Backus–Naur form In computer science, Backus–Naur form (BNF, pronounced ), also known as Backus normal form, is a notation system for defining the Syntax (programming languages), syntax of Programming language, programming languages and other Formal language, for ...
(BNF) grammars, which are equivalent to
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the fo ...
s and more powerful than
regular expression A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
s.Gimpel, J. F. 1973. A theory of discrete patterns and their implementation in SNOBOL4. Commun. ACM 16, 2 (Feb. 1973), 91–100. DOI=http://doi.acm.org/10.1145/361952.361960.


See also

* Artificial Intelligence Markup Language (AIML) for an AI language based on matching patterns in speech * AWK language * Coccinelle pattern matches C source code *
Matching wildcards In computer science, an algorithm for matching wildcards (also known as globbing) is useful in comparing text strings that may contain wildcard character, wildcard syntax. Common uses of these algorithms include command-line interfaces, e.g. the Bo ...
*
glob (programming) glob() () is a libc function for ''globbing'', which is the archetypal use of pattern matching against the names in a filesystem directory such that a name pattern is expanded into a list of names matching that pattern. Although ''globbing'' m ...
*
Pattern calculus Pattern calculus bases all computation on pattern matching of a very general kind. Like lambda calculus, it supports a uniform treatment of function evaluation. Also, it allows functions to be passed as arguments and returned as results. In addit ...
*
Pattern recognition Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess PR capabilities but their p ...
for fuzzy patterns *
PCRE Perl Compatible Regular Expressions (PCRE) is a library written in C, which implements a regular expression engine, inspired by the capabilities of the Perl programming language. Philip Hazel started writing PCRE in summer 1997. PCRE's synta ...
Perl Compatible Regular Expressions, a common modern implementation of string pattern matching ported to many languages * REBOL parse dialect for pattern matching used to implement language dialects *
Symbolic integration In calculus, symbolic integration is the problem of finding a formula for the antiderivative, or ''indefinite integral'', of a given function ''f''(''x''), i.e. to find a formula for a differentiable function ''F''(''x'') such that :\frac = f(x ...
*
Tagged union In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type, or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. ...
* Tom (pattern matching language) *
SNOBOL SNOBOL ("StriNg Oriented and symBOlic Language") is a series of programming languages developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph Griswold and Ivan P. Polonsky, culminating in SNOBOL4. It was one of a ...
for a programming language based on one kind of pattern matching *
Pattern language A pattern language is an organized and coherent set of ''patterns'', each of which describes a problem and the core of a solution that can be used in many ways within a specific field of expertise. The term was coined by architect Christopher Ale ...
— metaphoric, drawn from architecture * Graph matching


References

* The Mathematica Book, chapte
Section 2.3: Patterns
* The Haskell 98 Report, chapte

* Python Reference Manual, chapte

* The Pure (programming language), Pure Programming Language, chapte
4.3: Patterns


External links



* Nikolaas N. Oosterhof, Philip K. F. Hölzenspies, and Jan Kuper
Application patterns
A presentation at Trends in Functional Programming, 2005
JMatch
the
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
language extended with pattern matching
ShowTrend
Online pattern matching for stock prices

by Dennis Ritchie - provides the history of regular expressions in computer programs
The Implementation of Functional Programming Languages, pages 53–103
Simon Peyton Jones, published by Prentice Hall, 1987.
Nemerle, pattern matching



PatMat: a C++ pattern matching library based on
SNOBOL SNOBOL ("StriNg Oriented and symBOlic Language") is a series of programming languages developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph Griswold and Ivan P. Polonsky, culminating in SNOBOL4. It was one of a ...
/SPITBOL * Temur Kutsia
Flat Matching
Journal of Symbolic Computation 43(12): 858–873. Describes in details flat matching in Mathematica.

pattern matching language for non-programmers {{DEFAULTSORT:Pattern Matching Pattern matching, Conditional constructs Articles with example Haskell code Functional programming Programming language comparisons Articles with example code Articles with example OCaml code