For

mathematical analysis Analysis is the branch of mathematics dealing with continuous functions, limit (mathematics), limits, and related theories, such as Derivative, differentiation, Integral, integration, measure (mathematics), measure, infinite sequences, series (m ...

and statistics, Leave-one-out error can refer to the following: * Leave-one-out cross-validation Stability (CVloo, for ''stability of Cross Validation with leave one out''): An

algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...

f has CVloo stability β with respect to the

loss function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "co ...

V if the following holds:

\forall i\in\, \mathbb_S\\geq1-\delta_

* Expected-to-leave-one-out error Stability (

Eloo_

, for ''Expected error from leaving one out''): An algorithm f has

Eloo_

stability if for each n there exists a

\beta_^m

and a

\delta_^m

such that:

\forall i\in\, \mathbb_S\\geq1-\delta_^m

, with

\beta_^m

and

\delta_^m

going to zero for

n\rightarrow\inf

Preliminary notations

With X and Y being a

subset In mathematics, set ''A'' is a subset of a set ''B'' if all elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they are unequal, then ''A'' is a proper subset o ...

of the

real number In mathematics, a real number is a number that can be used to measurement, measure a ''continuous'' one-dimensional quantity such as a distance, time, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small var ...

s R, or X and Y ⊂ R, being respectively an input space X and an output space Y, we consider a

training set In machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data. Such algorithms function by making data-driven predictions or decisions, through building a mathematical model from ...

S = \

of size m in

Z = X \times Y

drawn

independently and identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usua ...

(i.i.d.) from an unknown distribution, here called "D". Then a learning algorithm is a function

f

from

Z_m

into

F \subset YX

which

maps A map is a symbolic depiction emphasizing relationships between elements of some space, such as objects, regions, or themes. Many maps are static, fixed to paper or some other durable medium, while others are dynamic or interactive. Althoug ...

a learning set S onto a function

f_S

from the input space X to the output space Y. To avoid complex notation, we consider only

deterministic algorithm In computer science, a deterministic algorithm is an algorithm that, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states. Deterministic algorithms are by far ...

s. It is also assumed that the algorithm

f

is symmetric with respect to S, i.e. it does not depend on the order of the elements in the

. Furthermore, we assume that all functions are measurable and all sets are countable which does not limit the interest of the results presented here. The loss of an

hypothesis A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. For a hypothesis to be a scientific hypothesis, the scientific method requires that one can testable, test it. Scientists generally base scientific hypotheses on prev ...

f with respect to an example

z = (x,y)

is then defined as

V(f,z) = V(f(x),y)

. The empirical error of f can then be written as

= \frac\sum V(f,z_i)

. The true error of f is

= \mathbb_z V(f,z)

Given a training set S of size m, we will build, for all i = 1....,m, modified training sets as follows: * By removing the i-th element

S^ = \

* and/or by replacing the i-th element

S^i = \{z_1 ,...,\ z_{i-1},\ z_i',\ z_{i+1},...,\ z_m\}

References

*S. Mukherjee, P. Niyogi, T. Poggio, and R. M. Rifkin. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv. Comput. Math., 25(1-3):161–193, 2006 Machine learning

Preliminary notations

See also

References