HOME

TheInfoList



OR:

In
theoretical computer science Theoretical computer science is a subfield of computer science and mathematics that focuses on the Abstraction, abstract and mathematical foundations of computation. It is difficult to circumscribe the theoretical areas precisely. The Associati ...
, a pattern language is a
formal language In logic, mathematics, computer science, and linguistics, a formal language is a set of strings whose symbols are taken from a set called "alphabet". The alphabet of a formal language consists of symbols that concatenate into strings (also c ...
that can be defined as the set of all particular instances of a
string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
of constants and variables. Pattern Languages were introduced by Dana Angluin in the context of
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
.


Definition

Given a finite set Σ of constant symbols and a countable set ''X'' of variable symbols disjoint from Σ, a pattern is a finite
non-empty In mathematics, the empty set or void set is the unique set having no elements; its size or cardinality (count of elements in a set) is zero. Some axiomatic set theories ensure that the empty set exists by including an axiom of empty set, whil ...
string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
of symbols from Σ∪''X''. The length of a pattern ''p'', denoted by , ''p'', , is just the number of its symbols. The set of all patterns containing exactly ''n'' distinct variables (each of which may occur several times) is denoted by ''P''''n'', the set of all patterns at all by ''P''*. A substitution is a mapping ''f'': ''P''* → ''P''* such thatAngluin's notion of substitution differs from the usual notion of
string substitution In computer science, in the area of formal language theory, frequent use is made of a variety of string functions; however, the notation used is different from that used for computer programming, and some commonly used functions in the theoretical ...
.
* ''f'' is a
homomorphism In algebra, a homomorphism is a morphism, structure-preserving map (mathematics), map between two algebraic structures of the same type (such as two group (mathematics), groups, two ring (mathematics), rings, or two vector spaces). The word ''homo ...
with respect to
string concatenation In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalizations of concatenati ...
(⋅), formally: ∀''p'',''q''∈''P''*. ''f''(''p''⋅''q'') = ''f''(''p'')⋅''f''(''q''); * ''f'' is non-erasing, formally: ∀''p''∈''P''*. ''f''(''p'') ≠ ε, where ε denotes the
empty string In formal language theory, the empty string, or empty word, is the unique String (computer science), string of length zero. Formal theory Formally, a string is a finite, ordered sequence of character (symbol), characters such as letters, digits ...
; and * ''f'' respects constants, formally: ∀''s''∈Σ. ''f''(''s'') = ''s''. If ''p'' = ''f''(''q'') for some patterns ''p'', ''q'' ∈ ''P''* and some substitution ''f'', then ''p'' is said to be less general than ''q'', written ''p''≤''q''; in that case, necessarily , ''p'', ≥ , ''q'', holds. For a pattern ''p'', its language is defined as the set of all less general patterns that are built from constants only, formally: ''L''(''p'') = , where Σ+ denotes the set of all finite non-empty strings of symbols from Σ. For example, using the constants Σ = and the variables ''X'' = , the pattern 0''x''10''xx''1 ∈''P''1 and ''xxy'' ∈''P''2 has length 7 and 3, respectively. An instance of the former pattern is 00''z''100''z''0''z''1 and 01''z''101''z''1''z''1, it is obtained by the substitution that maps ''x'' to 0''z'' and to 1''z'', respectively, and each other symbol to itself. Both 00''z''100''z''0''z''1 and 01''z''101''z''1''z''1 are also instances of ''xxy''. In fact, ''L''(0''x''10''xx''1) is a subset of ''L''(''xxy''). The language of the pattern ''x''0 and ''x''1 is the set of all bit strings which denote an even and odd
binary number A binary number is a number expressed in the Radix, base-2 numeral system or binary numeral system, a method for representing numbers that uses only two symbols for the natural numbers: typically "0" (zero) and "1" (one). A ''binary number'' may ...
, respectively. The language of ''xx'' is the set of all strings obtainable by concatenating a bit string with itself, e.g. 00, 11, 0101, 1010, 11101110 ∈ ''L''(''xx'').


Properties

The problem of deciding whether ''s'' ∈ ''L''(''p'') for an arbitrary string ''s'' ∈ Σ+ and pattern ''p'' is
NP-complete In computational complexity theory, NP-complete problems are the hardest of the problems to which ''solutions'' can be verified ''quickly''. Somewhat more precisely, a problem is NP-complete when: # It is a decision problem, meaning that for any ...
(see picture), and so is hence the problem of deciding ''p'' ≤ ''q'' for arbitrary patterns ''p'', ''q''. The class of pattern languages is not closed under ... * union: e.g. for Σ = as
above Above may refer to: *Above (artist) Tavar Zawacki (b. 1981, California) is a Polish, Portuguese - American abstract artist and internationally recognized visual artist based in Berlin, Germany. From 1996 to 2016, he created work under the ...
, ''L''(01)∪''L''(10) is not a pattern language; * complement: Σ+ \ ''L''(0) is not a pattern language; * intersection: ''L''(''x''0''y'')∩''L''(''x''1''y'') is not a pattern language; * Kleene plus: ''L''(0)+ is not a pattern language; * homomorphism: ''f''(''L''(''x'')) = ''L''(0)+ is not a pattern language, assuming ''f''(0) = 0 = ''f''(1); * inverse homomorphism: ''f''−1(111) = is not a pattern language, assuming ''f''(0) = 1 and ''f''(1) = 11. The class of pattern languages is closed under ... * concatenation: ''L''(''p'')⋅''L''(''q'') = ''L''(''p''⋅''q''); * reversal: ''L''(''p'')rev = ''L''(''p''rev). If ''p'', ''q'' ∈ ''P''1 are patterns containing exactly one variable, then ''p'' ≤ ''q'' if and only if ''L''(''p'') ⊆ ''L''(''q''); the same equivalence holds for patterns of equal length. For patterns of different length, the
above Above may refer to: *Above (artist) Tavar Zawacki (b. 1981, California) is a Polish, Portuguese - American abstract artist and internationally recognized visual artist based in Berlin, Germany. From 1996 to 2016, he created work under the ...
example ''p'' = 0''x''10''xx''1 and ''q'' = ''xxy'' shows that ''L''(''p'') ⊆ ''L''(''q'') may hold without implying ''p'' ≤ ''q''. However, any two patterns ''p'' and ''q'', of arbitrary lengths, generate the same language if and only if they are equal up to consistent variable renaming. Each pattern ''p'' is a common generalization of all strings in its generated language ''L''(''p''), modulo associativity of (⋅).


Location in the Chomsky hierarchy

In a refined
Chomsky hierarchy The Chomsky hierarchy in the fields of formal language theory, computer science, and linguistics, is a containment hierarchy of classes of formal grammars. A formal grammar describes how to form strings from a formal language's alphabet that are v ...
, the class of pattern languages is a proper superclass and subclass of the singletoni.e. languages consisting of a single string; they correspond to straight-line grammars and the indexed languages, respectively, but incomparable to the language classes in between; due to the latter, the pattern language class is not explicitly shown in the table
below Below may refer to: *Earth *Ground (disambiguation) *Soil *Floor * Bottom (disambiguation) *Less than *Temperatures below freezing *Hell or underworld People with the surname * Ernst von Below (1863–1955), German World War I general * Fred Belo ...
. The class of pattern languages is incomparable with the class of finite languages, with the class of
regular language In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...
s, and with the class of
context-free language In formal language theory, a context-free language (CFL), also called a Chomsky type-2 language, is a language generated by a context-free grammar (CFG). Context-free languages have many applications in programming languages, in particular, mos ...
s: * the pattern language ''L''(''xx'') is not context-free (hence neither regular nor
finite Finite may refer to: * Finite set, a set whose cardinality (number of elements) is some natural number * Finite verb, a verb form that has a subject, usually being inflected or marked for person and/or tense or aspect * "Finite", a song by Sara Gr ...
) due to the pumping lemma; * the finite (hence also regular and context-free) language is not a pattern language. Each singleton language is trivially a pattern language, generated by a pattern without variables. Each pattern language can be produced by an
indexed grammar Indexed grammars are a generalization of context-free grammars in that nonterminals are equipped with lists of ''flags'', or ''index symbols''. The language produced by an indexed grammar is called an indexed language. Definition Modern definiti ...
: For example, using Σ = and ''X'' = , the pattern ''a'' ''x'' ''b'' ''y'' ''c'' ''x'' ''a'' ''y'' ''b'' is generated by a grammar with nonterminal symbols ''N'' = ∪ ''X'', terminal symbols ''T'' = Σ, index symbols ''F'' = , start symbol ''S''''x'', and the following production rules: An example derivation is:   ⇒     ⇒     ⇒     ⇒     ⇒     ⇒     ⇒     ⇒     ⇒     ⇒     ⇒ ... ⇒     ⇒     ⇒ ... ⇒     ⇒     ⇒ ... ⇒     ⇒   In a similar way, an index grammar can be constructed from any pattern.


Learning patterns

Given a sample set ''S'' of strings, a pattern ''p'' is called descriptive of ''S'' if ''S'' ⊆ ''L''(''p''), but not ''S'' ⊆ ''L''(''q'') ⊂ ''L''(''p'') for any other pattern ''q''. Given any sample set ''S'', a descriptive pattern for ''S'' can be computed by * enumerating all patterns (up to variable renaming) not longer than the shortest string in ''S'', * selecting from them the patterns that generate a superset of ''S'', * selecting from them the patterns of maximal length, and * selecting from them a pattern that is minimal with respect to ≤. Based on this algorithm, the class of pattern languages can be identified in the limit from positive examples.; here: Example 1, p.125


Notes


References

{{formal languages and grammars Formal languages Theoretical computer science Machine learning