Anti-unification is the process of constructing a generalization common to two given symbolic expressions. As in unification, several frameworks are distinguished depending on which expressions (also called terms) are allowed, and which expressions are considered equal. If variables representing functions are allowed in an expression, the process is called "higher-order anti-unification", otherwise "first-order anti-unification". If the generalization is required to have an instance literally equal to each input expression, the process is called "syntactical anti-unification", otherwise "E-anti-unification", or "anti-unification modulo theory". An anti-unification algorithm should compute for given expressions a complete, and minimal generalization set, that is, a set covering all generalizations, and containing no redundant members, respectively. Depending on the framework, a complete and minimal generalization set may have one, finitely many, or possibly infinitely many members, or may not exist at all;Complete generalization sets always exist, but it may be the case that every complete generalization set is non-minimal. it cannot be empty, since a trivial generalization exists in any case. For first-order syntactical anti-unification,

Gordon Plotkin Gordon David Plotkin, (born 9 September 1946) is a theoretical computer scientist in the School of Informatics at the University of Edinburgh. Plotkin is probably best known for his introduction of structural operational semantics (SOS) and hi ...

gave an algorithm that computes a complete and minimal singleton generalization set containing the so-called "least general generalization" (lgg). Anti-unification should not be confused with dis-unification. The latter means the process of solving systems of

inequation In mathematics, an inequation is a statement that an inequality holds between two values. It is usually written in the form of a pair of expressions denoting the values in question, with a relational sign between them indicating the specific i ...

s, that is of finding values for the variables such that all given inequations are satisfied.Comon referred in 1986 to inequation-solving as "anti-unification", which nowadays has become quite unusual. This task is quite different from finding generalizations.

Prerequisites

Formally, an anti-unification approach presupposes * An infinite set ''V'' of ''variables''. For higher-order anti-unification, it is convenient to choose ''V'' disjoint from the set of

lambda-term bound variables Lambda calculus (also written as ''λ''-calculus) is a formal system in mathematical logic for expressing computation based on function abstraction and application using variable binding and substitution. It is a universal model of computation t ...

. * A set ''T'' of ''terms'' such that ''V'' ⊆ ''T''. For first-order and higher-order anti-unification, ''T'' is usually the set of

first-order terms In mathematical logic, a term denotes a mathematical object while a formula denotes a mathematical fact. In particular, terms appear as components of a formula. This is analogous to natural language, where a noun phrase refers to an object and a w ...

(terms built from variable and function symbols) and

lambda terms Lambda calculus (also written as ''λ''-calculus) is a formal system in mathematical logic for expressing computation based on function abstraction and application using variable binding and substitution. It is a universal model of computation ...

(terms containing some higher-order variables), respectively. * An ''

equivalence relation In mathematics, an equivalence relation is a binary relation that is reflexive, symmetric and transitive. The equipollence relation between line segments in geometry is a common example of an equivalence relation. Each equivalence relatio ...

\equiv

T

, indicating which terms are considered equal. For higher-order anti-unification, usually

t \equiv u

t

and

u

are alpha equivalent. For first-order E-anti-unification,

\equiv

reflects the background knowledge about certain function symbols; for example, if

\oplus

is considered commutative,

t \equiv u

u

results from

t

by swapping the arguments of

\oplus

at some (possibly all) occurrences.E.g.

a \oplus (b \oplus f(x)) \equiv a \oplus (f(x) \oplus b) \equiv (b \oplus f(x)) \oplus a \equiv (f(x) \oplus b) \oplus a

If there is no background knowledge at all, then only literally, or syntactically, identical terms are considered equal.

First-order term

Given a set

V

of variable symbols, a set

C

of constant symbols and sets

F_n

n

-ary function symbols, also called operator symbols, for each natural number

n \geq 1

, the set of (unsorted first-order) terms

T

is recursively defined to be the smallest set with the following properties:; here: Sect.1.3 * every variable symbol is a term: ''V'' ⊆ ''T'', * every constant symbol is a term: ''C'' ⊆ ''T'', * from every ''n'' terms ''t''₁,...,''t_n'', and every ''n''-ary function symbol ''f'' ∈ ''F_n'', a larger term

f(t_1,\ldots,t_n)

can be built. For example, if ''x'' ∈ ''V'' is a variable symbol, 1 ∈ ''C'' is a constant symbol, and add ∈ ''F''₂ is a binary function symbol, then ''x'' ∈ ''T'', 1 ∈ ''T'', and (hence) add(''x'',1) ∈ ''T'' by the first, second, and third term building rule, respectively. The latter term is usually written as ''x''+1, using

Infix notation Infix notation is the notation commonly used in arithmetical and logical formulae and statements. It is characterized by the placement of operators between operands—"infixed operators"—such as the plus sign in . Usage Binary relations are ...

and the more common operator symbol + for convenience.

Higher-order term

Substitution

A ''substitution'' is a mapping

\sigma: V \longrightarrow T

from variables to terms; the notation

\

refers to a substitution mapping each variable

x_i

to the term

t_i

, for

i=1,\ldots,k

, and every other variable to itself. Applying that substitution to a term is written in postfix notation as

t \

; it means to (simultaneously) replace every occurrence of each variable

x_i

in the term by

t_i

. The result of applying a substitution to a term is called an ''instance'' of that term . As a first-order example, applying the substitution

\

to the term :

Generalization, specialization

If a term

t

has an instance equivalent to a term

u

, that is, if

t \sigma \equiv u

for some substitution

\sigma

, then

t

is called ''more general'' than

u

, and

u

is called ''more special'' than, or ''subsumed'' by,

t

. For example,

x \oplus a

is more general than

a \oplus b

\oplus

commutative In mathematics, a binary operation is commutative if changing the order of the operands does not change the result. It is a fundamental property of many binary operations, and many mathematical proofs depend on it. Most familiar as the name o ...

, since then

(x \oplus a)\ = b \oplus a \equiv a \oplus b

. If

\equiv

is literal (syntactic) identity of terms, a term may be both more general and more special than another one only if both terms differ just in their variable names, not in their syntactic structure; such terms are called ''variants'', or ''renamings'' of each other. For example,

f(x_1,a,g(z_1),y_1)

is a variant of

f(x_2,a,g(z_2),y_2)

, since

f(x_1,a,g(z_1),y_1) \ = f(x_2,a,g(z_2),y_2)

and

f(x_2,a,g(z_2),y_2) \ = f(x_1,a,g(z_1),y_1)

. However,

f(x_1,a,g(z_1),y_1)

is ''not'' a variant of

f(x_2,a,g(x_2),x_2)

, since no substitution can transform the latter term into the former one, although

\

achieves the reverse direction. The latter term is hence properly more special than the former one. A substitution

\sigma

is ''more special'' than, or ''subsumed'' by, a substitution

\tau

x \sigma

is more special than

x \tau

for each variable

x

. For example,

\

is more special than

\

, since

f(u)

and

f(f(u))

is more special than

z

and

f(z)

, respectively.

Anti-unification problem, generalization set

An ''anti-unification problem'' is a pair

\langle t_1, t_2 \rangle

of terms. A term

t

is a common ''generalization'', or ''anti-unifier'', of

t_1

and

t_2

t \sigma_1 \equiv t_1

and

t \sigma_2 \equiv t_2

for some substitutions

\sigma_1, \sigma_2

. For a given anti-unification problem, a set

S

of anti-unifiers is called ''complete'' if each generalization subsumes some term

t \in S

; the set

S

is called ''minimal'' if none of its members subsumes another one.

First-order syntactical anti-unification

The framework of first-order syntactical anti-unification is based on

T

being the set of ''first-order terms'' (over some given set

V

of variables,

C

of constants and

F_n

n

-ary function symbols) and on

\equiv

being ''syntactic equality''. In this framework, each anti-unification problem

\langle t_1, t_2 \rangle

has a complete, and obviously minimal,

singleton Singleton may refer to: Sciences, technology Mathematics * Singleton (mathematics), a set with exactly one element * Singleton field, used in conformal field theory Computing * Singleton pattern, a design pattern that allows only one instance of ...

solution set

\

. Its member

t

is called the ''least general generalization (lgg)'' of the problem, it has an instance syntactically equal to

t_1

and another one syntactically equal to

t_2

. Any common generalization of

t_1

and

t_2

subsumes

t

. The lgg is unique up to variants: if

S_1

and

S_2

are both complete and minimal solution sets of the same syntactical anti-unification problem, then

S_1 = \

and

S_2 = \

for some terms

s_1

and

s_2

, that are renamings of each other. Plotkin has given an algorithm to compute the lgg of two given terms. It presupposes an injective mapping

\phi: T \times T \longrightarrow V

, that is, a mapping assigning each pair

s,t

of terms an own variable

\phi(s,t)

, such that no two pairs share the same variable. From a theoretical viewpoint, such a mapping exists, since both

V

and

T \times T

are

countably infinite In mathematics, a set is countable if either it is finite or it can be made in one to one correspondence with the set of natural numbers. Equivalently, a set is ''countable'' if there exists an injective function from it into the natural numbers ...

sets; for practical purposes,

\phi

can be built up as needed, remembering assigned mappings

\langle s,t,\phi(s,t) \rangle

in a

hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', ...

. The algorithm consists of two rules: : For example,

(0*0) \sqcup (4*4) \rightsquigarrow (0 \sqcup 4)*(0 \sqcup 4) \rightsquigarrow \phi(0,4) * \phi(0,4) \rightsquigarrow x*x

; this least general generalization reflects the common property of both inputs of being square numbers. Plotkin used his algorithm to compute the " relative least general generalization (rlgg)" of two clause sets in first-order logic, which was the basis of the

Golem A golem ( ; he, , gōlem) is an animated, anthropomorphic being in Jewish folklore, which is entirely created from inanimate matter (usually clay or mud). The most famous golem narrative involves Judah Loew ben Bezalel, the late 16th-centur ...

approach to inductive logic programming.

First-order anti-unification modulo theory

* * *
Software.

Equational theories

*One associative and commutative operation: ; *Commutative theories: *Free monoids: *Regular congruence classes: ; *A-, C-, AC-, ACU-theories with ordered sorts: *Purely idempotent theories:

First-order sorted anti-unification

*Taxonomic sorts: ; ; *Feature terms: * * *A-, C-, AC-, ACU-theories with ordered sorts: see above

Nominal anti-unification

* Baumgartner, Alexander; Kutsia, Temur; Levy, Jordi; Villaret, Mateu (Jun 2013)
Nominal Anti-Unification
Proc. RTA 2015. Vol. 36 of LIPIcs. Schloss Dagstuhl, 57-73
Software.

Applications

* Program analysis: ; * Code factoring: * Induction proving: * Information Extraction: * Case-based reasoning: * Program synthesis: The idea of generalizing terms with respect to an equational theory can be traced back to Manna and Waldinger (1978, 1980) who desired to apply it in program synthesis. In section "Generalization", they suggest (on p. 119 of the 1980 article) to generalize ''reverse''(''l'') and ''reverse''(''tail''(''l''))<> 'head''(''l'')to obtain ''reverse(l')<>m' ''. This generalization is only possible if the background equation ''u''<>[]=''u'' is considered. : — preprint of the 1980 article : * Natural language processing:

Higher-order anti-unification

*Calculus of constructions: * Simply-typed lambda calculus (Input: Terms in the eta-long beta-normal form. Output: higher-order patterns): Baumgartner, Alexander; Kutsia, Temur; Levy, Jordi; Villaret, Mateu (Jun 2013)
A Variant of Higher-Order Anti-Unification
Proc. RTA 2013. Vol. 21 of LIPIcs. Schloss Dagstuhl, 113-127
Software.
*Simply-typed lambda calculus (Input: Terms in the eta-long beta-normal form. Output: Various fragments of the simply-typed lambda calculus including patterns): * Restricted Higher-Order Substitutions: ;

Notes

References

{{Reflist Inductive logic programming Automated theorem proving Logic in computer science Unification (computer science)