Empirical process
   HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
, an empirical process is a
stochastic process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that ap ...
that describes the proportion of objects in a system in a given state. For a process in a discrete state space a population continuous time Markov chain or Markov population model is a process which counts the number of objects in a given state (without rescaling). In
mean field theory In physics and probability theory, Mean-field theory (MFT) or Self-consistent field theory studies the behavior of high-dimensional random (stochastic) models by studying a simpler model that approximates the original by averaging over degrees of ...
, limit theorems (as the number of objects becomes large) are considered and generalise the
central limit theorem In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themsel ...
for
empirical measure In probability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical sta ...
s. Applications of the theory of empirical processes arise in
non-parametric statistics Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distri ...
.


Definition

For ''X''1, ''X''2, ... ''X''''n'' independent and identically-distributed random variables in R with common
cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Eve ...
''F''(''x''), the empirical distribution function is defined by :F_n(x)=\frac\sum_^n I_(X_i), where I''C'' is the
indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , one has \mathbf_(x)=1 if x\i ...
of the set ''C''. For every (fixed) ''x'', ''F''''n''(''x'') is a sequence of random variables which converge to ''F''(''x'')
almost surely In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or Lebesgue measure 1). In other words, the set of possible exceptions may be non-empty, but it has probability 0. ...
by the strong
law of large numbers In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials shou ...
. That is, ''F''''n'' converges to ''F''
pointwise In mathematics, the qualifier pointwise is used to indicate that a certain property is defined by considering each value f(x) of some function f. An important class of pointwise concepts are the ''pointwise operations'', that is, operations defined ...
. Glivenko and Cantelli strengthened this result by proving
uniform convergence In the mathematical field of analysis, uniform convergence is a mode of convergence of functions stronger than pointwise convergence. A sequence of functions (f_n) converges uniformly to a limiting function f on a set E if, given any arbitrarily ...
of ''F''''n'' to ''F'' by the
Glivenko–Cantelli theorem In the theory of probability, the Glivenko–Cantelli theorem (sometimes referred to as the Fundamental Theorem of Statistics), named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli, determines the asymptotic behaviour of the empir ...
. A centered and scaled version of the empirical measure is the signed measure :G_n(A)=\sqrt(P_n(A)-P(A)) It induces a map on measurable functions ''f'' given by :f\mapsto G_n f=\sqrt(P_n-P)f=\sqrt\left(\frac\sum_^n f(X_i)-\mathbbf\right) By the
central limit theorem In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themsel ...
, G_n(A) converges in distribution to a normal random variable ''N''(0, ''P''(''A'')(1 − ''P''(''A''))) for fixed measurable set ''A''. Similarly, for a fixed function ''f'', G_nf converges in distribution to a normal random variable N(0,\mathbb(f-\mathbbf)^2), provided that \mathbbf and \mathbbf^2 exist. Definition :\bigl(G_n(c)\bigr)_ is called an ''empirical process'' indexed by \mathcal, a collection of measurable subsets of ''S''. :\bigl(G_nf\bigr)_ is called an ''empirical process'' indexed by \mathcal, a collection of measurable functions from ''S'' to \mathbb. A significant result in the area of empirical processes is Donsker's theorem. It has led to a study of Donsker classes: sets of functions with the useful property that empirical processes indexed by these classes converge weakly to a certain
Gaussian process In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution, i.e. ...
. While it can be shown that Donsker classes are Glivenko–Cantelli classes, the converse is not true in general.


Example

As an example, consider
empirical distribution function In statistics, an empirical distribution function (commonly also called an empirical Cumulative Distribution Function, eCDF) is the distribution function associated with the empirical measure of a sample. This cumulative distribution function ...
s. For real-valued iid random variables ''X''1, ''X''2, ..., ''X''''n'' they are given by :F_n(x)=P_n((-\infty,x])=P_nI_. In this case, empirical processes are indexed by a class \mathcal=\. It has been shown that \mathcal is a Donsker class, in particular, :\sqrt(F_n(x)-F(x)) converges Weak convergence of measures, weakly in \ell^\infty(\mathbb) to a
Brownian bridge A Brownian bridge is a continuous-time stochastic process ''B''(''t'') whose probability distribution is the conditional probability distribution of a standard Wiener process ''W''(''t'') (a mathematical model of Brownian motion) subject to the co ...
''B''(''F''(''x'')) .


See also

*
Khmaladze transformation In statistics, the Khmaladze transformation is a mathematical tool used in constructing convenient goodness of fit tests for hypothetical distribution functions. More precisely, suppose X_1,\ldots, X_n are i.i.d., possibly multi-dimensional, rand ...
* Weak convergence of measures *
Glivenko–Cantelli theorem In the theory of probability, the Glivenko–Cantelli theorem (sometimes referred to as the Fundamental Theorem of Statistics), named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli, determines the asymptotic behaviour of the empir ...


References


Further reading

* * * * * * * *


External links


Empirical Processes: Theory and Applications
by David Pollard, a textbook available online.
Introduction to Empirical Processes and Semiparametric Inference
by Michael Kosorok, another textbook available online. {{Stochastic processes Nonparametric statistics