Active learning is a special case of

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

in which a learning algorithm can interactively query a user (or some other information source) to label new data points with the desired outputs. In statistics literature, it is sometimes also called optimal experimental design. The information source is also called ''teacher'' or ''oracle''. There are situations in which unlabeled data is abundant but manual labeling is expensive. In such a scenario, learning algorithms can actively query the user/teacher for labels. This type of iterative supervised learning is called active learning. Since the learner chooses the examples, the number of examples to learn a concept can often be much lower than the number required in normal supervised learning. With this approach, there is a risk that the algorithm is overwhelmed by uninformative examples. Recent developments are dedicated to multi-label active learning, hybrid active learning and active learning in a single-pass (on-line) context, combining concepts from the field of machine learning (e.g. conflict and ignorance) with adaptive, incremental learning policies in the field of

online machine learning In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques whic ...

. Large-scale active learning projects may benefit from crowdsourcing frameworks such as Amazon Mechanical Turk that include many humans in the active learning loop.

Definitions

Let be the total set of all data under consideration. For example, in a protein engineering problem, would include all proteins that are known to have a certain interesting activity and all additional proteins that one might want to test for that activity. During each iteration, , is broken up into three subsets #

\mathbf_

: Data points where the label is known. #

\mathbf_

: Data points where the label is unknown. #

\mathbf_

: A subset of that is chosen to be labeled. Most of the current research in active learning involves the best method to choose the data points for .

Scenarios

*Membership Query Synthesis: This is where the learner generates its own instance from an underlying natural distribution. For example, if the dataset are pictures of humans and animals, the learner could send a clipped image of a leg to the teacher and query if this appendage belongs to an animal or human. This is particularly useful if the dataset is small. *Pool-Based Sampling: In this scenario, instances are drawn from the entire data pool and assigned a confidence score, a measurement of how well the learner “understands” the data. The system then selects the instances for which it is the least confident and queries the teacher for the labels. *Stream-Based Selective Sampling: Here, each unlabeled data point is examined one at a time with the machine evaluating the informativeness of each item against its query parameters. The learner decides for itself whether to assign a label or query the teacher for each datapoint.

Query strategies

Algorithms for determining which data points should be labeled can be organized into a number of different categories, based upon their purpose: *Balance exploration and exploitation: the choice of examples to label is seen as a dilemma between the exploration and the exploitation over the data space representation. This strategy manages this compromise by modelling the active learning problem as a contextual bandit problem. For example, Bouneffouf et al. propose a sequential algorithm named Active Thompson Sampling (ATS), which, in each round, assigns a sampling distribution on the pool, samples one point from this distribution, and queries the oracle for this sample point label. *Expected model change: label those points that would most change the current model. *Expected error reduction: label those points that would most reduce the model's generalization error. *Exponentiated Gradient Exploration for Active Learning: In this paper, the author proposes a sequential algorithm named exponentiated gradient (EG)-active that can improve any active learning algorithm by an optimal random exploration. *Uncertainty sampling: label those points for which the current model is least certain as to what the correct output should be. *Query by committee: a variety of models are trained on the current labeled data, and vote on the output for unlabeled data; label those points for which the "committee" disagrees the most *Querying from diverse subspaces or partitions: When the underlying model is a forest of trees, the leaf nodes might represent (overlapping) partitions of the original

feature space In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating and independent features is a crucial element of effective algorithms in pattern r ...

. This offers the possibility of selecting instances from non-overlapping or minimally overlapping partitions for labeling. *Variance reduction: label those points that would minimize output variance, which is one of the components of error. *Conformal predictors: This method predicts that a new data point will have a label similar to old data points in some specified way and degree of the similarity within the old examples is used to estimate the confidence in the prediction. *Mismatch-first farthest-traversal: The primary selection criterion is the prediction mismatch between the current model and nearest-neighbour prediction. It targets on wrongly predicted data points. The second selection criterion is the distance to previously selected data, the farthest first. It aims at optimizing the diversity of selected data. *User Centered Labeling Strategies: Learning is accomplished by applying dimensionality reduction to graphs and figures like scatter plots. Then the user is asked to label the compiled data (categorical, numerical, relevance scores, relation between two instances. A wide variety of algorithms have been studied that fall into these categories.

Minimum marginal hyperplane

Some active learning algorithms are built upon support-vector machines (SVMs) and exploit the structure of the SVM to determine which data points to label. Such methods usually calculate the margin, , of each unlabeled datum in and treat as an -dimensional distance from that datum to the separating hyperplane. Minimum Marginal Hyperplane methods assume that the data with the smallest are those that the SVM is most uncertain about and therefore should be placed in to be labeled. Other similar methods, such as Maximum Marginal Hyperplane, choose data with the largest . Tradeoff methods choose a mix of the smallest and largest s.

Notes

{{reflist , refs= {{cite journal , last1=Lughofer , first1=Edwin , title=Hybrid active learning for reducing the annotation effort of operators in classification systems , journal=Pattern Recognition , date=February 2012 , volume=45 , issue=2 , pages=884–896 , doi=10.1016/j.patcog.2011.08.009, bibcode=2012PatRe..45..884L {{cite book , first1=Djallel , last1=Bouneffouf , first2=Romain , last2=Laroche , first3=Tanguy , last3=Urvoy , first4=Raphael , last4=Féraud , first5=Robin , last5=Allesiardo , year=2014 , chapter-url=https://hal.archives-ouvertes.fr/hal-01069802 , chapter=Contextual Bandit for Active Learning: Active Thompson , doi=10.1007/978-3-319-12637-1_51 , isbn=978-3-319-12636-4 , id=HAL Id: hal-01069802 , editor=Loo, C. K. , editor2=Yap, K. S. , editor3=Wong, K. W. , editor4=Teoh, A. , editor5=Huang, K. , title=Neural Information Processing , volume=8834 , pages=405–412 , series=Lecture Notes in Computer Science , s2cid=1701357 , url=https://hal.archives-ouvertes.fr/hal-01069802/file/Contextual_Bandit_for_Active_Learning.pdf {{cite book , doi=10.1145/1557019.1557119 , isbn=978-1-60558-495-9 , chapter-url=https://www.microsoft.com/en-us/research/wp-content/uploads/2009/01/sigkdd09-yang.pdf, chapter=Effective multi-label active learning for text classification , title=Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '09 , pages=917 , year=2009 , last1=Yang , first1=Bishan , last2=Sun , first2=Jian-Tao , last3=Wang , first3=Tengjiao , last4=Chen , first4=Zheng , citeseerx=10.1.1.546.9358 , s2cid=1979173 {{Cite journal , doi=10.1007/s12530-012-9060-7 , title = Single-pass active learning with conflict and ignorance, journal=Evolving Systems, volume=3, issue=4, pages=251–271, year = 2012, last1 = Lughofer, first1 = Edwin, s2cid = 43844282 {{cite journal , last1=Bouneffouf , first1=Djallel , title=Exponentiated Gradient Exploration for Active Learning , journal=Computers , date=8 January 2016 , volume=5 , issue=1 , pages=1 , doi=10.3390/computers5010001, arxiv=1408.2196 , s2cid=14313852 , doi-access=free {{Cite web, url=https://github.com/shubhomoydas/ad_examples#query-diversity-with-compact-descriptions, title=shubhomoydas/ad_examples, website=GitHub, language=en, access-date=2018-12-04 {{Cite journal, arxiv=2002.05033, title=Active learning for sound event detection, language=en, journal=IEEE/ACM Transactions on Audio, Speech, and Language Processing, last1=Zhao, first1=Shuyang, last2=Heittola, first2=Toni, last3=Virtanen, first3=Tuomas, year=2020 Machine learning

Definitions

Scenarios

Query strategies

Minimum marginal hyperplane

See also

Notes