machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

, boosting is an ensemble meta-algorithm for primarily reducing

bias Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group ...

, and also variance in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones. Boosting is based on the question posed by Kearns and

Valiant Valiant may refer to: People * James Valiant (1884–1917), English cricketer * The Valiant Brothers, a professional wrestling tag team of storyline brothers ** Jerry Valiant, a ring name of professional wrestler John Hill (1941-2010) ** Jimmy ...

(1988, 1989):Michael Kearns(1988)
''Thoughts on Hypothesis Boosting''
Unpublished manuscript (Machine Learning class project, December 1988) "Can a set of weak learners create a single strong learner?" A weak learner is defined to be a classifier that is only slightly correlated with the true classification (it can label examples better than random guessing). In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification. Robert Schapire's affirmative answer in a 1990 paper to the question of Kearns and Valiant has had significant ramifications in

and statistics, most notably leading to the development of boosting. When first introduced, the ''hypothesis boosting problem'' simply referred to the process of turning a weak learner into a strong learner. "Informally,

he hypothesis boosting He or HE may refer to: Language * He (pronoun), an English pronoun * He (kana), the romanization of the Japanese kana へ * He (letter), the fifth letter of many Semitic alphabets * He (Cyrillic), a letter of the Cyrillic script called ''He'' in ...

problem asks whether an efficient learning algorithm ��that outputs a hypothesis whose performance is only slightly better than random guessing .e. a weak learnerimplies the existence of an efficient algorithm that outputs a hypothesis of arbitrary accuracy .e. a strong learner" Algorithms that achieve hypothesis boosting quickly became simply known as "boosting". Freund and Schapire's arcing (Adapt tve Resampling and Combining), as a general technique, is more or less synonymous with boosting.

Boosting algorithms

While boosting is not algorithmically constrained, most boosting algorithms consist of iteratively learning weak classifiers with respect to a distribution and adding them to a final strong classifier. When they are added, they are weighted in a way that is related to the weak learners' accuracy. After a weak learner is added, the data weights are readjusted, known as "re- weighting". Misclassified input data gain a higher weight and examples that are classified correctly lose weight. Thus, future weak learners focus more on the examples that previous weak learners misclassified. Ensemble Boosting

There are many boosting algorithms. The original ones, proposed by Robert Schapire (a recursive majority gate formulation) and Yoav Freund (boost by majority),Llew Mason, Jonathan Baxter, Peter Bartlett, and Marcus Frean (2000); ''Boosting Algorithms as Gradient Descent'', in S. A. Solla, T. K. Leen, and K.-R. Muller, editors, ''Advances in Neural Information Processing Systems'' 12, pp. 512-518, MIT Press were not adaptive and could not take full advantage of the weak learners. Schapire and Freund then developed AdaBoost, an adaptive boosting algorithm that won the prestigious Gödel Prize. Only algorithms that are provable boosting algorithms in the probably approximately correct learning formulation can accurately be called ''boosting algorithms''. Other algorithms that are similar in spirit to boosting algorithms are sometimes called "leveraging algorithms", although they are also sometimes incorrectly called boosting algorithms. The main variation between many boosting algorithms is their method of weighting training data points and hypotheses. AdaBoost is very popular and the most significant historically as it was the first algorithm that could adapt to the weak learners. It is often the basis of introductory coverage of boosting in university machine learning courses. There are many more recent algorithms such as LPBoost, TotalBoost, BrownBoost, xgboost, MadaBoost, LogitBoost, and others. Many boosting algorithms fit into the AnyBoost framework, which shows that boosting performs gradient descent in a function space using a convex cost function.

Object categorization in computer vision

Given images containing various known objects in the world, a classifier can be learned from them to automatically

classify Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...

the objects in future images. Simple classifiers built based on some

image feature In computer vision and image processing, a feature is a piece of information about the content of an image; typically about whether a certain region of the image has certain properties. Features may be specific structures in the image such as poi ...

of the object tend to be weak in categorization performance. Using boosting methods for object categorization is a way to unify the weak classifiers in a special way to boost the overall ability of categorization.

Problem of object categorization

Object categorization is a typical task of

computer vision Computer vision is an Interdisciplinarity, interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate t ...

that involves determining whether or not an image contains some specific category of object. The idea is closely related with recognition, identification, and detection. Appearance based object categorization typically contains feature extraction, learning a classifier, and applying the classifier to new examples. There are many ways to represent a category of objects, e.g. from shape analysis, bag of words models, or local descriptors such as SIFT, etc. Examples of supervised classifiers are Naive Bayes classifiers, support vector machines, mixtures of Gaussians, and neural networks. However, research has shown that object categories and their locations in images can be discovered in an unsupervised manner as well.

Status quo for object categorization

The recognition of object categories in images is a challenging problem in

, especially when the number of categories is large. This is due to high intra class variability and the need for generalization across variations of objects within the same category. Objects within one category may look quite different. Even the same object may appear unalike under different viewpoint,

scale Scale or scales may refer to: Mathematics * Scale (descriptive set theory), an object defined on a set of points * Scale (ratio), the ratio of a linear dimension of a model to the corresponding dimension of the original * Scale factor, a number ...

, and illumination. Background clutter and partial occlusion add difficulties to recognition as well. Humans are able to recognize thousands of object types, whereas most of the existing object recognition systems are trained to recognize only a few, e.g. human faces, cars, simple objects, etc. Research has been very active on dealing with more categories and enabling incremental additions of new categories, and although the general problem remains unsolved, several multi-category objects detectors (for up to hundreds or thousands of categories) have been developed. One means is by feature sharing and boosting.

Boosting for binary categorization

AdaBoost can be used for face detection as an example of binary categorization. The two categories are faces versus background. The general algorithm is as follows: #Form a large set of simple features #Initialize weights for training images #For T rounds ##Normalize the weights ##For available features from the set, train a classifier using a single feature and evaluate the training error ##Choose the classifier with the lowest error ##Update the weights of the training images: increase if classified wrongly by this classifier, decrease if correctly #Form the final strong classifier as the linear combination of the T classifiers (coefficient larger if training error is small) After boosting, a classifier constructed from 200 features could yield a 95% detection rate under a

10^

false positive rate. Another application of boosting for binary categorization is a system that detects pedestrians using

patterns A pattern is a regularity in the world, in human-made design, or in abstract ideas. As such, the elements of a pattern repeat in a predictable manner. A geometric pattern is a kind of pattern formed of geometric shapes and typically repeated li ...

of motion and appearance. This work is the first to combine both motion information and appearance information as features to detect a walking person. It takes a similar approach to the Viola-Jones object detection framework.

Boosting for multi-class categorization

Compared with binary categorization, multi-class categorization looks for common features that can be shared across the categories at the same time. They turn to be more generic edge like features. During learning, the detectors for each category can be trained jointly. Compared with training separately, it generalizes better, needs less training data, and requires fewer features to achieve the same performance. The main flow of the algorithm is similar to the binary case. What is different is that a measure of the joint training error shall be defined in advance. During each iteration the algorithm chooses a classifier of a single feature (features that can be shared by more categories shall be encouraged). This can be done via converting multi-class classification into a binary one (a set of categories versus the rest), or by introducing a penalty error from the categories that do not have the feature of the classifier. In the paper "Sharing visual features for multiclass and multiview object detection", A. Torralba et al. used GentleBoost for boosting and showed that when training data is limited, learning via sharing features does a much better job than no sharing, given same boosting rounds. Also, for a given performance level, the total number of features required (and therefore the run time cost of the classifier) for the feature sharing detectors, is observed to scale approximately

logarithm In mathematics, the logarithm is the inverse function to exponentiation. That means the logarithm of a number to the base is the exponent to which must be raised, to produce . For example, since , the ''logarithm base'' 10 of ...

ically with the number of class, i.e., slower than

linear Linearity is the property of a mathematical relationship ('' function'') that can be graphically represented as a straight line. Linearity is closely related to '' proportionality''. Examples in physics include rectilinear motion, the linear ...

growth in the non-sharing case. Similar results are shown in the paper "Incremental learning of object detectors using a visual shape alphabet", yet the authors used AdaBoost for boosting.

Convex vs. non-convex boosting algorithms

Boosting algorithms can be based on convex or non-convex optimization algorithms. Convex algorithms, such as AdaBoost and LogitBoost, can be "defeated" by random noise such that they can't learn basic and learnable combinations of weak hypotheses.P. Long and R. Servedio. 25th International Conference on Machine Learning (ICML), 2008, pp. 608--615. This limitation was pointed out by Long & Servedio in 2008. However, by 2009, multiple authors demonstrated that boosting algorithms based on non-convex optimization, such as BrownBoost, can learn from noisy datasets and can specifically learn the underlying classifier of the Long–Servedio dataset.

Implementations

* scikit-learn, an open source machine learning library for Python * Orange, a free data mining software suite, modul
Orange.ensemble
* Weka is a machine learning set of tools that offers variate implementations of boosting algorithms like AdaBoost and LogitBoost * R packag
GBM
(Generalized Boosted Regression Models) implements extensions to Freund and Schapire's AdaBoost algorithm and Friedman's gradient boosting machine.
jboost
AdaBoost, LogitBoost, RobustBoost, Boostexter and alternating decision trees * R packag

Applies Multiclass AdaBoost.M1, AdaBoost-SAMME and Bagging * R packag

An implementation of gradient boosting for linear and tree-based models.

Notes

References

External links

* Robert E. Schapire (2003)
''The Boosting Approach to Machine Learning: An Overview''
MSRI (Mathematical Sciences Research Institute) Workshop on Nonlinear Estimation and Classification * Zhou Zhi-Hua (2014
''Boosting 25 years''
CCL 2014 Keynote. * * {{Authority control Classification algorithms Ensemble learning Learning in computer vision Object recognition and categorization