Inverse Entailment
   HOME

TheInfoList



OR:

Inductive logic programming (ILP) is a subfield of
symbolic artificial intelligence Symbolic may refer to: * Symbol, something that represents an idea, a process, or a physical entity Mathematics, logic, and computing * Symbolic computation, a scientific area concerned with computing with mathematical formulas * Symbolic dynamic ...
which uses
logic programming Logic programming is a programming, database and knowledge representation paradigm based on formal logic. A logic program is a set of sentences in logical form, representing knowledge about some problem domain. Computation is performed by applyin ...
as a uniform representation for examples, background knowledge and hypotheses. The term "''inductive''" here refers to
philosophical Philosophy ('love of wisdom' in Ancient Greek) is a systematic study of general and fundamental questions concerning topics like existence, reason, knowledge, Value (ethics and social sciences), value, mind, and language. It is a rational an ...
(i.e. suggesting a theory to explain observed facts) rather than
mathematical Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
(i.e. proving a property for all members of a well-ordered set) induction. Given an encoding of the known background knowledge and a set of examples represented as a logical
database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples. * Schema: ''positive examples'' + ''negative examples'' + ''background knowledge'' ⇒ ''hypothesis''. Inductive logic programming is particularly useful in
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
and
natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
.


History

Building on earlier work on
Inductive inference Inductive reasoning refers to a variety of methods of reasoning in which the conclusion of an argument is supported not with deductive certainty, but with some degree of probability. Unlike ''deductive'' reasoning (such as mathematical inducti ...
,
Gordon Plotkin Gordon David Plotkin (born 9 September 1946) is a theoretical computer scientist in the School of Informatics at the University of Edinburgh. Plotkin is probably best known for his introduction of structural operational semantics (SOS) and his ...
was the first to formalise induction in a clausal setting around 1970, adopting an approach of generalising from examples. In 1981,
Ehud Shapiro Ehud Shapiro (; born 1955) is an Israeli scientist, entrepreneur, artist, and political activist who is Professor of Computer Science and Biology at the Weizmann Institute of Science. With international reputation, he made contributions to many s ...
introduced several ideas that would shape the field in his new approach of model inference, an algorithm employing refinement and backtracing to search for a complete axiomatisation of given examples. His first implementation was the
Model Inference System A model is an informative representation of an object, person, or system. The term originally denoted the plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin , . Models can be divided into ...
in 1981: a
Prolog Prolog is a logic programming language that has its origins in artificial intelligence, automated theorem proving, and computational linguistics. Prolog has its roots in first-order logic, a formal logic. Unlike many other programming language ...
program that inductively inferred
Horn clause In mathematical logic and logic programming, a Horn clause is a logical formula of a particular rule-like form that gives it useful properties for use in logic programming, formal specification, universal algebra and model theory. Horn clauses are ...
logic programs from positive and negative examples. The term ''Inductive Logic Programming'' was first introduced in a paper by Stephen Muggleton in 1990, defined as the intersection of machine learning and logic programming. Muggleton and Wray Buntine introduced predicate invention and inverse resolution in 1988. Several inductive logic programming systems that proved influential appeared in the early 1990s.
FOIL Foil may refer to: Materials * Foil (metal), a quite thin sheet of metal, usually manufactured with a rolling mill machine * Metal leaf, a very thin sheet of decorative metal * Aluminium foil, a type of wrapping for food * Tin foil, metal foil ma ...
, introduced by Ross Quinlan in 1990 was based on upgrading
propositional A proposition is a statement that can be either true or false. It is a central concept in the philosophy of language, semantics, logic, and related fields. Propositions are the object s denoted by declarative sentences; for example, "The sky ...
learning algorithms AQ and
ID3 ID3 is a metadata container most often used in conjunction with the MP3 audio file format. It allows information such as the title, artist, album, track number, and other information about the file to be stored in the file itself. ID3 is a '' ...
.
Golem A golem ( ; ) is an animated Anthropomorphism, anthropomorphic being in Jewish folklore, which is created entirely from inanimate matter, usually clay or mud. The most famous golem narrative involves Judah Loew ben Bezalel, the late 16th-century ...
, introduced by Muggleton and Feng in 1990, went back to a restricted form of Plotkin's least generalisation algorithm. The
Progol Progol is an implementation of inductive logic programming that combines inverse entailment with general-to-specific search through a refinement graph. Features Inverse entailment is used with mode declarations to derive the bottom clause, ...
system, introduced by Muggleton in 1995, first implemented inverse entailment, and inspired many later systems.
Aleph Aleph (or alef or alif, transliterated ʾ) is the first Letter (alphabet), letter of the Semitic abjads, including Phoenician alphabet, Phoenician ''ʾālep'' 𐤀, Hebrew alphabet, Hebrew ''ʾālef'' , Aramaic alphabet, Aramaic ''ʾālap'' � ...
, a descendant of Progol introduced by Ashwin Srinivasan in 2001, is still one of the most widely used systems . At around the same time, the first practical applications emerged, particularly in
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
, where by 2000 inductive logic programming had been successfully applied to drug design, carcinogenicity and mutagenicity prediction, and elucidation of the structure and function of proteins. Unlike the focus on
automatic programming In computer science, automatic programming is a type of computer programming in which some mechanism generates a computer program, to allow human programmers to write the code at a higher abstraction level. There has been little agreement on the ...
inherent in the early work, these fields used inductive logic programming techniques from a viewpoint of
relational data mining Relational data mining is the data mining technique for relational databases.Dzeroski, Saso, Nada Lavrač, Lavrač, Nada (Eds.), Relational Data Mining, Springer 200/ref> Unlike traditional data mining algorithms, which look for patterns in a single ...
. The success of those initial applications and the lack of progress in recovering larger traditional logic programs shaped the focus of the field. Recently, classical tasks from automated programming have moved back into focus, as the introduction of meta-interpretative learning makes predicate invention and learning recursive programs more feasible. This technique was pioneered with the Metagol system introduced by Muggleton, Dianhuan Lin, Niels Pahlavi and Alireza Tamaddoni-Nezhad in 2014. This allows ILP systems to work with fewer examples, and brought successes in learning string transformation programs, answer set grammars and general algorithms.


Setting

Inductive logic programming has adopted several different learning settings, the most common of which are learning from
entailment Logical consequence (also entailment or logical implication) is a fundamental concept in logic which describes the relationship between statements that hold true when one statement logically ''follows from'' one or more statements. A valid l ...
and learning from interpretations. In both cases, the input is provided in the form of ''background knowledge '', a logical theory (commonly in the form of
clauses In language, a clause is a Constituent (linguistics), constituent or Phrase (grammar), phrase that comprises a semantic predicand (expressed or not) and a semantic Predicate (grammar), predicate. A typical clause consists of a subject (grammar), ...
used in
logic programming Logic programming is a programming, database and knowledge representation paradigm based on formal logic. A logic program is a set of sentences in logical form, representing knowledge about some problem domain. Computation is performed by applyin ...
), as well as positive and negative examples, denoted E^+ and E^ respectively. The output is given as a ''hypothesis'' ', itself a logical theory that typically consists of one or more clauses. The two settings differ in the format of examples presented.


Learning from entailment

, learning from entailment is by far the most popular setting for inductive logic programming. In this setting, the ''positive'' and ''negative'' examples are given as finite sets E^+ and E^ of positive and negated ground literals, respectively. A ''correct hypothesis'' ' is a set of clauses satisfying the following requirements, where the turnstile symbol \models stands for logical entailment: \begin \text & B \cup H & \models & E^+ \\ \text & B \cup H \cup E^- & \not\models & \textit \end Completeness requires any generated hypothesis ' to explain all positive examples E^+, and consistency forbids generation of any hypothesis ' that is inconsistent with the negative examples E^, both given the background knowledge '. In Muggleton's setting of concept learning,; here: Sect.2.1 "completeness" is referred to as "sufficiency", and "consistency" as "strong consistency". Two further conditions are added: "''Necessity''", which postulates that ' does not entail E^+, does not impose a restriction on ', but forbids any generation of a hypothesis as long as the positive facts are explainable without it. "Weak consistency", which states that no contradiction can be derived from B\land H, forbids generation of any hypothesis ' that contradicts the background knowledge '. Weak consistency is implied by strong consistency; if no negative examples are given, both requirements coincide. Weak consistency is particularly important in the case of noisy data, where completeness and strong consistency cannot be guaranteed.


Learning from interpretations

In learning from interpretations, the ''positive'' and ''negative'' examples are given as a set of complete or partial
Herbrand structure In first-order logic, a Herbrand structure S is a structure over a vocabulary \sigma (also sometimes called a ''signature'') that is defined solely by the syntactical properties of \sigma. The idea is to take the symbol strings of terms as their ...
s, each of which are themselves a finite set of ground literals. Such a structure ' is said to be a model of the set of clauses B \cup H if for any substitution \theta and any clause \mathrm \leftarrow \mathrm in B \cup H such that \mathrm\theta \subseteq e, \mathrm\theta \subseteq e also holds. The goal is then to output a hypothesis that is ''complete,'' meaning every positive example is a model of B \cup H, and ''consistent,'' meaning that no negative example is a model of B \cup H.


Approaches to ILP

An ''inductive logic programming system'' is a program that takes as an input logic theories B, E^+, E^- and outputs a correct hypothesis with respect to theories B, E^+, E^-. A system is ''complete'' if and only if for any input logic theories B, E^+, E^- any correct hypothesis with respect to these input theories can be found with its hypothesis search procedure. Inductive logic programming systems can be roughly divided into two classes, search-based and meta-interpretative systems. Search-based systems exploit that the space of possible clauses forms a
complete lattice In mathematics, a complete lattice is a partially ordered set in which all subsets have both a supremum ( join) and an infimum ( meet). A conditionally complete lattice satisfies at least one of these properties for bounded subsets. For compariso ...
under the
subsumption Subsumption may refer to: * A minor premise in symbolic logic (see syllogism) * The Liskov substitution principle in object-oriented programming * Subtyping in programming language theory * Subsumption architecture in robotics * A subsumption ...
relation, where one clause C_1 subsumes another clause C_2 if there is a substitution \theta such that C_1\theta, the result of applying \theta to C_1, is a subset of C_2. This lattice can be traversed either bottom-up or top-down.


Bottom-up search

Bottom-up methods to search the subsumption lattice have been investigated since Plotkin's first work on formalising induction in clausal logic in 1970. Techniques used include least general generalisation, based on
anti-unification Anti-unification is the process of constructing a generalization common to two given symbolic expressions. As in unification, several frameworks are distinguished depending on which expressions (also called terms) are allowed, and which expressio ...
, and inverse resolution, based on inverting the resolution inference rule.


Least general generalisation

A ''least general generalisation algorithm'' takes as input two clauses C_1 and C_2 and outputs the least general generalisation of C_1 and C_2, that is, a clause C that subsumes C_1 and C_2, and that is subsumed by every other clause that subsumes C_1 and C_2. The least general generalisation can be computed by first computing all ''selections'' from C and D, which are pairs of literals (L,M) \in (C_1, C_2) sharing the same predicate symbol and negated/unnegated status. Then, the least general generalisation is obtained as the disjunction of the least general generalisations of the individual selections, which can be obtained by
first-order syntactical anti-unification Anti-unification is the process of constructing a generalization common to two given symbolic expressions. As in unification, several frameworks are distinguished depending on which expressions (also called terms) are allowed, and which expressio ...
. To account for background knowledge, inductive logic programming systems employ ''relative least general generalisations'', which are defined in terms of subsumption relative to a background theory. In general, such relative least general generalisations are not guaranteed to exist; however, if the background theory ' is a finite set of ground literals, then the negation of ' is itself a clause. In this case, a relative least general generalisation can be computed by disjoining the negation of ' with both C_1 and C_2 and then computing their least general generalisation as before. Relative least general generalisations are the foundation of the bottom-up system
Golem A golem ( ; ) is an animated Anthropomorphism, anthropomorphic being in Jewish folklore, which is created entirely from inanimate matter, usually clay or mud. The most famous golem narrative involves Judah Loew ben Bezalel, the late 16th-century ...
.


Inverse resolution

Inverse resolution is an
inductive reasoning Inductive reasoning refers to a variety of method of reasoning, methods of reasoning in which the conclusion of an argument is supported not with deductive certainty, but with some degree of probability. Unlike Deductive reasoning, ''deductive'' ...
technique that involves inverting the resolution operator. Inverse resolution takes information about the resolvent of a resolution step to compute possible resolving clauses. Two types of inverse resolution operator are in use in inductive logic programming: V-operators and W-operators. A V-operator takes clauses R and C_1as input and returns a clause C_2 such that R is the resolvent of C_1 and C_2. A W-operator takes two clauses R_1 and R_2 and returns three clauses C_1, C_2 and C_3 such that R_1 is the resolvent of C_1 and C_2 and R_2 is the resolvent of C_2 and C_3. Inverse resolution was first introduced by Stephen Muggleton and Wray Buntine in 1988 for use in the inductive logic programming system Cigol. By 1993, this spawned a surge of research into inverse resolution operators and their properties.


Top-down search

The ILP systems Progol, Hail and Imparo find a hypothesis using the principle of the inverse entailment for theories , , : B \land H \models E \iff B \land \neg E \models \neg H. First they construct an intermediate theory called a bridge theory satisfying the conditions B \land \neg E \models F and F \models \neg H. Then as H \models \neg F, they generalize the negation of the bridge theory with anti-entailment. However, the operation of anti-entailment is computationally more expensive since it is highly nondeterministic. Therefore, an alternative hypothesis search can be conducted using the inverse subsumption (anti-subsumption) operation instead, which is less non-deterministic than anti-entailment. Questions of completeness of a hypothesis search procedure of specific inductive logic programming system arise. For example, the Progol hypothesis search procedure based on the inverse entailment inference rule is not complete by ''Yamamoto's example''. On the other hand, Imparo is complete by both anti-entailment procedure and its extended inverse subsumption procedure.


Metainterpretive learning

Rather than explicitly searching the hypothesis graph, metainterpretive or ''meta-level'' systems encode the inductive logic programming program as a meta-level logic program which is then solved to obtain an optimal hypothesis. Formalisms used to express the problem specification include
Prolog Prolog is a logic programming language that has its origins in artificial intelligence, automated theorem proving, and computational linguistics. Prolog has its roots in first-order logic, a formal logic. Unlike many other programming language ...
and
answer set programming Answer set programming (ASP) is a form of declarative programming oriented towards difficult (primarily NP-hard) search problems. It is based on the stable model (answer set) semantics of logic programming. In ASP, search problems are reduced ...
, with existing Prolog systems and answer set solvers used for solving the constraints. And example of a Prolog-based system is Metagol, which is based on a meta-interpreter in Prolog, while ASPAL and ILASP are based on an encoding of the inductive logic programming problem in answer set programming.


Evolutionary learning

Evolutionary algorithm Evolutionary algorithms (EA) reproduce essential elements of the biological evolution in a computer algorithm in order to solve "difficult" problems, at least Approximation, approximately, for which no exact or satisfactory solution methods are k ...
s in ILP use a population-based approach to evolve hypotheses, refining them through selection, crossover, and mutation. Methods like EvoLearner have been shown to outperform traditional approaches on structured machine learning benchmarks.


List of implementations


1BC and 1BC2: first-order naive Bayesian classifiers:

ACE (A Combined Engine)

Aleph

Atom

Claudien

DL-Learner

DMax

FastLAS (Fast Learning from Answer Sets)
* FOIL (First Order Inductive Learner) *
Golem A golem ( ; ) is an animated Anthropomorphism, anthropomorphic being in Jewish folklore, which is created entirely from inanimate matter, usually clay or mud. The most famous golem narrative involves Judah Loew ben Bezalel, the late 16th-century ...

ILASP (Inductive Learning of Answer Set Programs)
* Imparo
Inthelex (INcremental THEory Learner from EXamples)



Metagol

Mio
* MIS (Model Inference System) by Ehud Shapiro
Ontolearn

Popper
*
PROGOL Progol is an implementation of inductive logic programming that combines inverse entailment with general-to-specific search through a refinement graph. Features Inverse entailment is used with mode declarations to derive the bottom clause, ...

RSD
* Warmr (now included in ACE)
ProGolem


Probabilistic inductive logic programming

Probabilistic inductive logic programming adapts the setting of inductive logic programming to learning probabilistic logic programs. It can be considered as a form of
statistical relational learning Statistical relational learning (SRL) is a subdiscipline of artificial intelligence and machine learning that is concerned with domain models that exhibit both uncertainty (which can be dealt with using statistical methods) and complex, relational ...
within the formalism of probabilistic logic programming. Given # background knowledge as a probabilistic logic program , and # a set of positive and negative examples E^ and E^ the goal of probabilistic inductive logic programming is to find a probabilistic logic program H such that the probability of positive examples according to is maximized and the probability of negative examples is minimized. This problem has two variants: parameter learning and structure learning. In the former, one is given the structure (the clauses) of and the goal is to infer the probabilities annotations of the given clauses, while in the latter the goal is to infer both the structure and the probability parameters of . Just as in classical inductive logic programming, the examples can be given as examples or as (partial) interpretations.


Parameter Learning

Parameter learning for languages following the distribution semantics has been performed by using an expectation-maximisation algorithm or by
gradient descent Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for minimizing a differentiable multivariate function. The idea is to take repeated steps in the opposite direction of the gradi ...
. An expectation-maximisation algorithm consists of a cycle in which the steps of expectation and maximization are repeatedly performed. In the expectation step, the distribution of the hidden variables is computed according to the current values of the probability parameters, while in the maximisation step, the new values of the parameters are computed. Gradient descent methods compute the gradient of the target function and iteratively modify the parameters moving in the direction of the gradient.


Structure Learning

Structure learning was pioneered by
Daphne Koller Daphne Koller (; born August 27, 1968) is an Israeli-American computer scientist. She was a professor in the department of computer science at Stanford University and a MacArthur Foundation fellowship recipient. She is one of the founders of Cour ...
and Avi Pfeffer in 1997, where the authors learn the structure of first-order rules with associated probabilistic uncertainty parameters. Their approach involves generating the underlying
graphical model A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. Graphical models are commonly used in ...
in a preliminary step and then applying expectation-maximisation. In 2008, De Raedt et al. presented an algorithm for performing
theory compression A theory is a systematic and rational form of abstract thinking about a phenomenon, or the conclusions derived from such thinking. It involves contemplative and logical reasoning, often supported by processes such as observation, experimentation, ...
on ProbLog programs, where theory compression refers to a process of removing as many clauses as possible from the theory in order to maximize the probability of a given set of positive and negative examples. No new clause can be added to the theory. In the same year, Meert, W. et al. introduced a method for learning parameters and structure of ground probabilistic logic programs by considering the
Bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Whi ...
s equivalent to them and applying techniques for learning Bayesian networks. ProbFOIL, introduced by De Raedt and Ingo Thon in 2010, combined the inductive logic programming system
FOIL Foil may refer to: Materials * Foil (metal), a quite thin sheet of metal, usually manufactured with a rolling mill machine * Metal leaf, a very thin sheet of decorative metal * Aluminium foil, a type of wrapping for food * Tin foil, metal foil ma ...
with ProbLog. Logical rules are learned from probabilistic data in the sense that both the examples themselves and their classifications can be probabilistic. The set of rules has to allow one to predict the probability of the examples from their description. In this setting, the parameters (the probability values) are fixed and the structure has to be learned. In 2011, Elena Bellodi and Fabrizio Riguzzi introduced SLIPCASE, which performs a beam search among probabilistic logic programs by iteratively refining probabilistic theories and optimizing the parameters of each theory using expectation-maximisation. Its extension SLIPCOVER, proposed in 2014, uses bottom clauses generated as in
Progol Progol is an implementation of inductive logic programming that combines inverse entailment with general-to-specific search through a refinement graph. Features Inverse entailment is used with mode declarations to derive the bottom clause, ...
to guide the refinement process, thus reducing the number of revisions and exploring the search space more effectively. Moreover, SLIPCOVER separates the search for promising clauses from that of the theory: the space of clauses is explored with a
beam search In computer science, beam search is a heuristic search algorithm that explores a graph by expanding the most promising node in a limited set. Beam search is a modification of best-first search that reduces its memory requirements. Best-first searc ...
, while the space of theories is searched greedily.


See also

*
Commonsense reasoning In artificial intelligence (AI), commonsense reasoning is a human-like ability to make presumptions about the type and essence of ordinary situations humans encounter every day. These assumptions include judgments about the nature of physical objec ...
*
Formal concept analysis In information science, formal concept analysis (FCA) is a principled way of deriving a ''concept hierarchy'' or formal ontology from a collection of objects and their properties. Each concept in the hierarchy represents the objects sharing som ...
*
Inductive reasoning Inductive reasoning refers to a variety of method of reasoning, methods of reasoning in which the conclusion of an argument is supported not with deductive certainty, but with some degree of probability. Unlike Deductive reasoning, ''deductive'' ...
*
Inductive programming Inductive programming (IP) is a special area of automatic programming, covering research from artificial intelligence and programming, which addresses learning of typically declarative (logic or functional) and often recursive programs from inc ...
*
Inductive probability Inductive probability attempts to give the probability of future events based on past events. It is the basis for inductive reasoning Inductive reasoning refers to a variety of method of reasoning, methods of reasoning in which the conclusion o ...
*
Statistical relational learning Statistical relational learning (SRL) is a subdiscipline of artificial intelligence and machine learning that is concerned with domain models that exhibit both uncertainty (which can be dealt with using statistical methods) and complex, relational ...
*
Version space learning Version space learning is a logical approach to machine learning, specifically binary classification. Version space learning algorithms search a predefined space of hypotheses, viewed as a set of logical sentences. Formally, the hypothesis space i ...


References


Further reading

* * * Visual example of inducing the grandparenthood relation by the Atom system. http://john-ahlgren.blogspot.com/2014/03/inductive-reasoning-visualized.html {{Programming paradigms navbox