The inductive bias (also known as learning bias) of a learning algorithm is the set of assumptions that the learner uses to predict outputs of given inputs that it has not encountered.
[
]
In
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
, one aims to construct algorithms that are able to ''learn'' to predict a certain target output. To achieve this, the learning algorithm is presented some training examples that demonstrate the intended relation of input and output values. Then the learner is supposed to approximate the correct output, even for examples that have not been shown during training. Without any additional assumptions, this problem cannot be solved since unseen situations might have an arbitrary output value. The kind of necessary assumptions about the nature of the target function are subsumed in the phrase ''inductive bias''.
[
]
A classical example of an inductive bias is
Occam's razor
Occam's razor, Ockham's razor, or Ocham's razor ( la, novacula Occami), also known as the principle of parsimony or the law of parsimony ( la, lex parsimoniae), is the problem-solving principle that "entities should not be multiplied beyond neces ...
, assuming that the simplest consistent hypothesis about the target function is actually the best. Here ''consistent'' means that the hypothesis of the learner yields correct outputs for all of the examples that have been given to the algorithm.
Approaches to a more formal definition of inductive bias are based on
mathematical logic
Mathematical logic is the study of formal logic within mathematics. Major subareas include model theory, proof theory, set theory, and recursion theory. Research in mathematical logic commonly addresses the mathematical properties of formal ...
. Here, the inductive bias is a logical formula that, together with the training data, logically entails the hypothesis generated by the learner. However, this strict formalism fails in many practical cases, where the inductive bias can only be given as a rough description (e.g. in the case of
artificial neural networks), or not at all.
Types
The following is a list of common inductive biases in machine learning algorithms.
* Maximum
conditional independence
In probability theory, conditional independence describes situations wherein an observation is irrelevant or redundant when evaluating the certainty of a hypothesis. Conditional independence is usually formulated in terms of conditional probabil ...
: if the hypothesis can be cast in a
Bayesian
Thomas Bayes (/beɪz/; c. 1701 – 1761) was an English statistician, philosopher, and Presbyterian minister.
Bayesian () refers either to a range of concepts and approaches that relate to statistical methods based on Bayes' theorem, or a follower ...
framework, try to maximize conditional independence. This is the bias used in the
Naive Bayes classifier
In statistics, naive Bayes classifiers are a family of simple " probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...
.
* Minimum
cross-validation error: when trying to choose among hypotheses, select the hypothesis with the lowest cross-validation error. Although cross-validation may seem to be free of bias, the
"no free lunch" theorems show that cross-validation must be biased.
* Maximum margin: when drawing a boundary between two classes, attempt to maximize the width of the boundary. This is the bias used in
support vector machines
In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories ...
. The assumption is that distinct classes tend to be separated by wide boundaries.
*
Minimum description length
Minimum Description Length (MDL) is a model selection principle where the shortest description of the data is the best model. MDL methods learn through a data compression perspective and are sometimes described as mathematical applications of Occa ...
: when forming a hypothesis, attempt to minimize the length of the description of the hypothesis.
* Minimum features: unless there is good evidence that a
feature
Feature may refer to:
Computing
* Feature (CAD), could be a hole, pocket, or notch
* Feature (computer vision), could be an edge, corner or blob
* Feature (software design) is an intentional distinguishing characteristic of a software item ...
is useful, it should be deleted. This is the assumption behind
feature selection
In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features (variables, predictors) for use in model construc ...
algorithms.
* Nearest neighbors: assume that most of the cases in a small neighborhood in
feature space
In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating and independent features is a crucial element of effective algorithms in pattern r ...
belong to the same class. Given a case for which the class is unknown, guess that it belongs to the same class as the majority in its immediate neighborhood. This is the bias used in the
k-nearest neighbors algorithm
In statistics, the ''k''-nearest neighbors algorithm (''k''-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and reg ...
. The assumption is that cases that are near each other tend to belong to the same class.
Shift of bias
Although most learning algorithms have a static bias, some algorithms are designed to shift their bias as they acquire more data.
[
] This does not avoid bias, since the bias shifting process itself must have a bias.
See also
*
Algorithmic bias
Algorithmic bias describes systematic and repeatable errors in a computer system that create " unfair" outcomes, such as "privileging" one category over another in ways different from the intended function of the algorithm.
Bias can emerge from ...
*
Cognitive bias
A cognitive bias is a systematic pattern of deviation from norm or rationality in judgment. Individuals create their own "subjective reality" from their perception of the input. An individual's construction of reality, not the objective input, ...
*
No free lunch in search and optimization
In computational complexity and optimization the no free lunch theorem is a result that states that for certain types of mathematical problems, the computational cost of finding a solution, averaged over all problems in the class, is the same ...
References
{{Biases
Bias
Machine learning