machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

, first-order inductive learner (FOIL) is a rule-based learning algorithm.

Background

Developed in 1990 by

Ross Quinlan John Ross Quinlan is a computer science researcher in data mining and decision theory. He has contributed extensively to the development of decision tree algorithms, including inventing the canonical C4.5 and ID3 algorithms. He also contributed to ...

,J.R. Quinlan. Learning Logical Definitions from Relations. Machine Learning, Volume 5, Number 3, 1990

/ref> FOIL learns function-free

Horn clause In mathematical logic and logic programming, a Horn clause is a logical formula of a particular rule-like form which gives it useful properties for use in logic programming, formal specification, and model theory. Horn clauses are named for the logi ...

s, a subset of

first-order predicate calculus First-order logic—also known as predicate logic, quantificational logic, and first-order predicate calculus—is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantif ...

. Given positive and negative examples of some concept and a set of background-knowledge predicates, FOIL inductively generates a logical concept definition or rule for the concept. The induced rule must not involve any constants (''color(X,red)'' becomes ''color(X,Y), red(Y)'') or function symbols, but may allow negated predicates; recursive concepts are also learnable. Like the

ID3 algorithm In decision tree learning, ID3 (Iterative Dichotomiser 3) is an algorithm invented by Ross QuinlanQuinlan, J. R. 1986. Induction of Decision Trees. Mach. Learn. 1, 1 (Mar. 1986), 81–106 used to generate a decision tree from a dataset. ID3 is the ...

, FOIL hill climbs using a metric based on

information theory Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...

to construct a rule that covers the data. Unlike ID3, however, FOIL uses a separate-and-conquer method rather than divide-and-conquer, focusing on creating one rule at a time and collecting uncovered examples for the next iteration of the algorithm.

Algorithm

The FOIL algorithm is as follows: :Input ''List of examples'' :Output ''Rule in first-order predicate logic'' :FOIL(Examples) ::Let Pos be the positive examples ::Let Pred be the predicate to be learned ::Until Pos is empty do: :::Let Neg be the negative examples :::Set Body to empty :::Call LearnClauseBody :::Add Pred ← Body to the rule :::Remove from Pos all examples which satisfy Body :Procedure LearnClauseBody ::Until Neg is empty do: :::Choose a literal L :::Conjoin L to Body :::Remove from Neg examples that do not satisfy L

Example

Suppose FOIL's task is to learn the concept ''grandfather(X,Y)'' given the relations ''father(X,Y)'' and ''parent(X,Y)''. Furthermore, suppose our current Body consists of ''grandfather(X,Y) ← parent(X,Z)''. This can be extended by conjoining Body with any of the literals ''father(X,X)'', ''father(Y,Z)'', ''parent(U,Y)'', or many others – to create this literal, the algorithm must choose both a predicate name and a set of variables for the predicate (at least one of which is required to be present already in an unnegated literal of the clause). If FOIL extends a clause ''grandfather(X,Y) ← true'' by conjoining the literal ''parent(X,Z)'', it is introducing the new variable ''Z''. Positive examples now consist of those values <''X,Y,Z''> such that ''grandfather(X,Y)'' is true and ''parent(X,Z)'' is true; negative examples are those where ''grandfather(X,Y)'' is true but ''parent(X,Z)'' is false. On the next iteration of FOIL after ''parent(X,Z)'' has been added, the algorithm will consider all combinations of predicate names and variables such that at least one variable in the new literal is present in the existing clause. This results in a very large search space.Let ''Var'' be the largest number of distinct variables for any clause in rule ''R'', excluding the last conjunct. Let ''MaxP'' be the number of predicates with largest

arity Arity () is the number of arguments or operands taken by a function, operation or relation in logic, mathematics, and computer science. In mathematics, arity may also be named ''rank'', but this word can have many other meanings in mathematics. In ...

''MaxA''. Then an approximation of the number of nodes generated to learn ''R'' is: ''NodesSearched ≤ 2 * MaxP * (Var + MaxA – 1)^MaxA'', as shown in Pazzani and Kibler (1992). Several extensions of the FOIL theory have shown that additions to the basic algorithm may reduce this search space, sometimes drastically.{{cn, date=January 2017

Extensions

The FOCL algorithmMichael Pazzani and Dennis Kibler. The Utility of Knowledge in Inductive Learning. Machine Learning, Volume 9, Number 1, 1992

/ref> (''First Order Combined Learner'') extends FOIL in a variety of ways, which affect how FOCL selects literals to test while extending a clause under construction. Constraints on the search space are allowed, as are predicates that are defined on a rule rather than on a set of examples (called ''intensional'' predicates); most importantly a potentially incorrect hypothesis is allowed as an initial approximation to the predicate to be learned. The main goal of FOCL is to incorporate the methods of

explanation-based learning Explanation-based learning (EBL) is a form of machine learning that exploits a very strong, or even perfect, domain theory (i.e. a formal theory of an application domain akin to a domain model in ontology engineering, not to be confused with Scott' ...

(EBL) into the empirical methods of FOIL. Even when no additional knowledge is provided to FOCL over FOIL, however, it utilizes an iterative widening search strategy similar to

depth-first search Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node (selecting some arbitrary node as the root node in the case of a graph) and explores as far as possible alo ...

: first FOCL attempts to learn a clause by introducing no free variables. If this fails (no positive gain), one additional free variable per failure is allowed until the number of free variables exceeds the maximum used for any predicate.

Constraints

Unlike FOIL, which does not put typing constraints on its variables, FOCL uses typing as an inexpensive way of incorporating a simple form of background knowledge. For example, a predicate ''livesAt(X,Y)'' may have types ''livesAt(person, location)''. Additional predicates may need to be introduced, though – without types, ''nextDoor(X,Y)'' could determine whether person ''X'' and person ''Y'' live next door to each other, or whether two locations are next door to each other. With types, two different predicates ''nextDoor(person, person)'' and ''nextDoor(location, location)'' would need to exist for this functionality to be maintained. However, this typing mechanism eliminates the need for predicates such as ''isPerson(X)'' or ''isLocation(Y)'', and need not consider ''livesAt(A,B)'' when ''A'' and ''B'' are defined to be person variables, reducing the search space. Additionally, typing can improve the accuracy of the resulting rule by eliminating from consideration impossible literals such as ''livesAt(A,B)'' which may nevertheless appear to have a high

information gain Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random, ...

. Rather than implementing trivial predicates such as ''equals(X,X)'' or ''between(X,X,Y)'', FOCL introduces implicit constraints on variables, further reducing search space. Some predicates must have all variables unique, others must have commutativity (''adjacent(X,Y)'' is equivalent to ''adjacent(Y,X)''), still others may require that a particular variable be present in the current clause, and many other potential constraints.

Operational rules

Operational rules are those rules which are defined ''extensionally'', or as a list of tuples for which a predicate is true. FOIL allows only operational rules; FOCL extends its knowledge base to allow combinations of rules called non-operational rules as well as partially defined or incorrect rules for robustness. Allowing for partial definitions reduces the amount of work needed as the algorithm need not generate these partial definitions for itself, and the incorrect rules do not add significantly to the work needed since they are discarded if they are not judged to provide positive information gain. Non-operational rules are advantageous as the individual rules which they combine may not provide information gain on their own, but are useful when taken in conjunction. If a literal with the most information gain in an iteration of FOCL is non-operational, it is operationalized and its definition is added to the clause under construction. :Inputs ''Literal to be operationalized, List of positive examples, List of negative examples'' :Output ''Literal in operational form'' :Operationalize(Literal, Positive examples, Negative examples) ::If Literal is operational :::Return Literal ::Initialize OperationalLiterals to the empty set ::For each clause in the definition of Literal :::Compute information gain of the clause over Positive examples and Negative examples ::For the clause with the maximum gain :::For each literal L in the clause ::::Add Operationalize(L, Positive examples, Negative examples) to OperationalLiterals An operational rule might be the literal ''lessThan(X,Y)''; a non-operational rule might be ''between(X,Y,Z) ← lessThan(X,Y), lessThan(Y,Z)''.

Initial rules

The addition of non-operational rules to the knowledge base increases the size of the space which FOCL must search. Rather than simply providing the algorithm with a target concept (e.g. ''grandfather(X,Y)''), the algorithm takes as input a set of non-operational rules which it tests for correctness and operationalizes for its learned concept. A correct target concept will clearly improve computational time and accuracy, but even an incorrect concept will give the algorithm a basis from which to work and improve accuracy and time.

References

*http://www.csc.liv.ac.uk/~frans/KDD/Software/FOIL_PRM_CPAR/foil.html Inductive logic programming