Error tolerance (PAC learning)

In PAC learning, error tolerance refers to the ability of an

algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...

to learn when the examples received have been corrupted in some way. In fact, this is a very common and important issue since in many applications it is not possible to access noise-free data. Noise can interfere with the learning process at different levels: the algorithm may receive data that have been occasionally mislabeled, or the inputs may have some false information, or the classification of the examples may have been maliciously adulterated.

Notation and the Valiant learning model

In the following, let

X

be our

n

-dimensional input space. Let

\mathcal

be a class of functions that we wish to use in order to learn a

\

-valued target function

f

defined over

X

. Let

\mathcal

be the distribution of the inputs over

X

. The goal of a learning algorithm

\mathcal

is to choose the best function

h \in \mathcal

such that it minimizes

error(h) = P_( h(x) \neq f(x))

. Let us suppose we have a function

size(f)

that can measure the complexity of

f

. Let

\text(x)

be an oracle that, whenever called, returns an example

x

and its correct label

f(x)

. When no noise corrupts the data, we can define learning in the Valiant setting: Definition: We say that

f

is efficiently learnable using

\mathcal

in the Valiant setting if there exists a learning algorithm

\mathcal

that has access to

\text(x)

and a polynomial

p(\cdot,\cdot,\cdot,\cdot)

such that for any

0 < \varepsilon \leq 1

and

0 < \delta \leq 1

it outputs, in a number of calls to the oracle bounded by

p\left(\frac,\frac,n,\text(f)\right)

, a function

h \in \mathcal

that satisfies with probability at least

1-\delta

the condition

\text(h) \leq \varepsilon

. In the following we will define learnability of

f

when data have suffered some modification.

Classification noise

In the classification noise modelKearns, M. J., & Vazirani, U. V. (1994)
An introduction to computational learning theory
chapter 5. MIT press. a noise rate

0 \leq \eta < \frac

is introduced. Then, instead of

\text(x)

that returns always the correct label of example

x

, algorithm

\mathcal

can only call a faulty oracle

\text(x,\eta)

that will flip the label of

x

with probability

\eta

. As in the Valiant case, the goal of a learning algorithm

\mathcal

is to choose the best function

h \in \mathcal

such that it minimizes

error(h) = P_( h(x) \neq f(x))

. In applications it is difficult to have access to the real value of

\eta

, but we assume we have access to its upperbound

\eta_B

. Note that if we allow the noise rate to be

1/2

, then learning becomes impossible in any amount of computation time, because every label conveys no information about the target function. Definition: We say that

f

is efficiently learnable using

\mathcal

in the classification noise model if there exists a learning algorithm

\mathcal

that has access to

\text(x,\eta)

and a polynomial

p(\cdot,\cdot,\cdot,\cdot)

such that for any

0 \leq \eta \leq \frac

0\leq \varepsilon \leq 1

and

0\leq \delta \leq 1

it outputs, in a number of calls to the oracle bounded by

p\left(\frac, \frac,\frac,n,size(f)\right)

, a function

h \in \mathcal

that satisfies with probability at least

1-\delta

the condition

error(h) \leq \varepsilon

Statistical query learning

Statistical Query LearningKearns, M. (1998). '' ww.cis.upenn.edu/~mkearns/papers/sq-journal.pdf Efficient noise-tolerant learning from statistical queries'. Journal of the ACM, 45(6), 983–1006. is a kind of

active learning Active learning is "a method of learning in which students are actively or experientially involved in the learning process and where there are different levels of active learning, depending on student involvement." states that "students partici ...

problem in which the learning algorithm

\mathcal

can decide if to request information about the likelihood

P_

that a function

f

correctly labels example

x

, and receives an answer accurate within a tolerance

\alpha

. Formally, whenever the learning algorithm

\mathcal

calls the oracle

\text(x,\alpha)

, it receives as feedback probability

Q_

, such that

Q_ - \alpha \leq P_ \leq Q_ + \alpha

. Definition: We say that

f

is efficiently learnable using

\mathcal

in the statistical query learning model if there exists a learning algorithm

\mathcal

that has access to

\text(x,\alpha)

and polynomials

p(\cdot,\cdot,\cdot)

q(\cdot,\cdot,\cdot)

, and

r(\cdot,\cdot,\cdot)

such that for any

0 < \varepsilon \leq 1

the following hold: #

\text(x,\alpha)

can evaluate

P_

in time

q\left(\frac,n,size(f)\right)

; #

\frac

is bounded by

r\left(\frac,n,size(f)\right)

\mathcal

outputs a model

h

such that

err(h)<\varepsilon

, in a number of calls to the oracle bounded by

p\left(\frac,n,size(f)\right)

. Note that the confidence parameter

\delta

does not appear in the definition of learning. This is because the main purpose of

\delta

is to allow the learning algorithm a small probability of failure due to an unrepresentative sample. Since now

\text(x,\alpha)

always guarantees to meet the approximation criterion

Q_ - \alpha \leq P_ \leq Q_ + \alpha

, the failure probability is no longer needed. The statistical query model is strictly weaker than the PAC model: any efficiently SQ-learnable class is efficiently PAC learnable in the presence of classification noise, but there exist efficient PAC-learnable problems such as parity that are not efficiently SQ-learnable.

Malicious classification

In the malicious classification model an adversary generates errors to foil the learning algorithm. This setting describes situations of

error burst In telecommunication, a burst error or error burst is a contiguous sequence of symbols, received over a communication channel, such that the first and last symbols are in error and there exists no contiguous subsequence of ''m'' correctly rec ...

, which may occur when for a limited time transmission equipment malfunctions repeatedly. Formally, algorithm

\mathcal

calls an oracle

\text(x,\beta)

that returns a correctly labeled example

x

drawn, as usual, from distribution

\mathcal

over the input space with probability

1- \beta

, but it returns with probability

\beta

an example drawn from a distribution that is not related to

\mathcal

. Moreover, this maliciously chosen example may strategically selected by an adversary who has knowledge of

f

\beta

\mathcal

, or the current progress of the learning algorithm. Definition: Given a bound

\beta_B<  \frac

for

0 \leq \beta < \frac

, we say that

f

is efficiently learnable using

\mathcal

in the malicious classification model, if there exist a learning algorithm

\mathcal

that has access to

\text(x,\beta)

and a polynomial

p(\cdot,\cdot,\cdot,\cdot,\cdot)

such that for any

0 < \varepsilon \leq 1

0 < \delta \leq 1

it outputs, in a number of calls to the oracle bounded by

p\left(\frac,\frac,\frac,n,size(f)\right)

, a function

h \in \mathcal

that satisfies with probability at least

1-\delta

the condition

error(h) \leq \varepsilon

Errors in the inputs: nonuniform random attribute noise

In the nonuniform random attribute noiseSloan, R. H. (1989).
Computational learning theory: New models and algorithms
' (Doctoral dissertation, Massachusetts Institute of Technology). model the algorithm is learning a

Boolean function In mathematics, a Boolean function is a function whose arguments and result assume values from a two-element set (usually , or ). Alternative names are switching function, used especially in older computer science literature, and truth function ...

, a malicious oracle

\text(x,\nu)

may flip each

i

-th bit of example

x=(x_1,x_2,\ldots,x_n)

independently with probability

\nu_i \leq \nu

. This type of error can irreparably foil the algorithm, in fact the following theorem holds: In the nonuniform random attribute noise setting, an algorithm

\mathcal

can output a function

h \in \mathcal

such that

error(h)<\varepsilon

only if

\nu < 2\varepsilon

References

{{Reflist Theoretical computer science Computational learning theory

Error tolerance (PAC learning)

Notation and the Valiant learning model

Classification noise

Statistical query learning

Malicious classification

Errors in the inputs: nonuniform random attribute noise

See also

References