computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...

, computational learning theory (or just learning theory) is a subfield of

artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech ...

devoted to studying the design and analysis of

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

algorithms.

Overview

Theoretical results in machine learning mainly deal with a type of inductive learning called supervised learning. In supervised learning, an algorithm is given samples that are labeled in some useful way. For example, the samples might be descriptions of mushrooms, and the labels could be whether or not the mushrooms are edible. The algorithm takes these previously labeled samples and uses them to induce a classifier. This classifier is a function that assigns labels to samples, including samples that have not been seen previously by the algorithm. The goal of the supervised learning algorithm is to optimize some measure of performance such as minimizing the number of mistakes made on new samples. In addition to performance bounds, computational learning theory studies the time complexity and feasibility of learning. In computational learning theory, a computation is considered feasible if it can be done in

polynomial time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...

. There are two kinds of time complexity results: * Positive resultsShowing that a certain class of functions is learnable in polynomial time. * Negative resultsShowing that certain classes cannot be learned in polynomial time. Negative results often rely on commonly believed, but yet unproven assumptions, such as: * Computational complexity – P ≠ NP (the P versus NP problem); *

Cryptographic Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or '' -logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adv ...

–

One-way function In computer science, a one-way function is a function that is easy to compute on every input, but hard to invert given the image of a random input. Here, "easy" and "hard" are to be understood in the sense of computational complexity theory, s ...

s exist. There are several different approaches to computational learning theory based on making different assumptions about the

inference Inferences are steps in reasoning, moving from premises to logical consequences; etymologically, the word ''wikt:infer, infer'' means to "carry forward". Inference is theoretically traditionally divided into deductive reasoning, deduction and in ...

principles used to generalise from limited data. This includes different definitions of

probability Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...

(see

frequency probability Frequentist probability or frequentism is an interpretation of probability; it defines an event's probability as the limit of its relative frequency in many trials (the long-run probability). Probabilities can be found (in principle) by a repe ...

Bayesian probability Bayesian probability is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification ...

) and different assumptions on the generation of samples. The different approaches include: * Exact learning, proposed by Dana Angluin; * Probably approximately correct learning (PAC learning), proposed by

Leslie Valiant Leslie Gabriel Valiant (born 28 March 1949) is a British American computer scientist and computational theorist. He was born to a chemical engineer father and a translator mother. He is currently the T. Jefferson Coolidge Professor of Comput ...

; * VC theory, proposed by

Vladimir Vapnik Vladimir Naumovich Vapnik (russian: Владимир Наумович Вапник; born 6 December 1936) is one of the main developers of the Vapnik–Chervonenkis theory of statistical learning, and the co-inventor of the support-vector machin ...

and

Alexey Chervonenkis Alexey Yakovlevich Chervonenkis (russian: link=no, Алексей Яковлевич Червоненкис; 7 September 1938 – 22 September 2014) was a Soviet and Russian mathematician. Along with Vladimir Vapnik, he was one of the main develop ...

; *

Bayesian inference Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and ...

; * Algorithmic learning theory, from the work of

E. Mark Gold E. Mark Gold (often written "E Mark Gold" without a dot, born 1936 in Los Angeles) is an American physicist, mathematician, and computer scientist. He became well known for his article '' Language identification in the limit'' which pioneered a f ...

; *

Online machine learning In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques whi ...

, from the work of Nick Littlestone. While its primary goal is to understand learning abstractly, computational learning theory has led to the development of practical algorithms. For example, PAC theory inspired boosting, VC theory led to

support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laborat ...

s, and Bayesian inference led to belief networks.

References

Surveys

* Angluin, D. 1992. Computational learning theory: Survey and selected bibliography. In Proceedings of the Twenty-Fourth Annual ACM Symposium on Theory of Computing (May 1992), pages 351–369. http://portal.acm.org/citation.cfm?id=129712.129746 * D. Haussler. Probably approximately correct learning. In AAAI-90 Proceedings of the Eight National Conference on Artificial Intelligence, Boston, MA, pages 1101–1108. American Association for Artificial Intelligence, 1990. http://citeseer.ist.psu.edu/haussler90probably.html

VC dimension VC may refer to: Military decorations * Victoria Cross, a military decoration awarded by the United Kingdom and also by certain Commonwealth nations Victoria Cross for Australia Victoria Cross (Canada) ** Victoria Cross for New Zealand * Vic ...

* V. Vapnik and A. Chervonenkis
On the uniform convergence of relative frequencies of events to their probabilities
Theory of Probability and Its Applications, 16(2):264–280, 1971.

Feature selection

* A. Dhagat and L. Hellerstein, "PAC learning with irrelevant attributes", in 'Proceedings of the IEEE Symp. on Foundation of Computer Science', 1994. http://citeseer.ist.psu.edu/dhagat94pac.html

Inductive inference

Optimal O notation learning

Oded Goldreich Oded Goldreich ( he, עודד גולדרייך; b. 1957) is a professor of Computer Science at the Faculty of Mathematics and Computer Science of Weizmann Institute of Science, Israel. His research interests lie within the theory of computation ...

Dana Ron Dana Ron Goldreich ( he, דנה רון גולדרייך; b. 1964) is a computer scientist, a professor of electrical engineering at the Tel Aviv University, Israel. Prof. Ron is one of the pioneers of research in property testing, and a leadi ...

.
On universal learning algorithms
'. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.2224

Negative results

* M. Kearns and

. 1989. Cryptographic limitations on learning boolean formulae and finite automata. In Proceedings of the 21st Annual ACM Symposium on Theory of Computing, pages 433–444, New York. ACM. http://citeseer.ist.psu.edu/kearns89cryptographic.html

Boosting (machine learning)

* Robert E. Schapire. The strength of weak learnability. Machine Learning, 5(2):197–227, 1990 http://citeseer.ist.psu.edu/schapire90strength.html

Occam learning In computational learning theory, Occam learning is a model of algorithmic learning where the objective of the learner is to output a succinct representation of received training data. This is closely related to probably approximately correct (P ...

* Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M. K.br>Occam's razor
Inf.Proc.Lett. 24, 377–380, 1987. * Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M. K
Learnability and the Vapnik-Chervonenkis dimension
Journal of the ACM, 36(4):929–865, 1989.

Probably approximately correct learning

* L. Valiant
A Theory of the Learnable
Communications of the ACM, 27(11):1134–1142, 1984.

Error tolerance

* Michael Kearns and Ming Li. Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4):807–837, August 1993. http://citeseer.ist.psu.edu/kearns93learning.html * Kearns, M. (1993). Efficient noise-tolerant learning from statistical queries. In Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory of Computing, pages 392–401. http://citeseer.ist.psu.edu/kearns93efficient.html

Equivalence

* D.Haussler, M.Kearns, N.Littlestone and M. Warmuth, Equivalence of models for polynomial learnability, Proc. 1st ACM Workshop on Computational Learning Theory, (1988) 42-55. * A description of some of these publications is given at important publications in machine learning.

Distribution learning theory

External links

Basics of Bayesian inference
{{Differentiable computing Theoretical computer science Computational fields of study de:Maschinelles Lernen