In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, an empirical distribution function (commonly also called an empirical Cumulative Distribution Function, eCDF) is the distribution function associated with the
empirical measure of a
sample. This
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ev ...
is a
step function that jumps up by at each of the data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value.
The empirical distribution function is an estimate of the cumulative distribution function that generated the points in the sample. It converges with probability 1 to that underlying distribution, according to the
Glivenko–Cantelli theorem. A number of results exist to quantify the rate of convergence of the empirical distribution function to the underlying cumulative distribution function.
Definition
Let be
independent, identically distributed real random variables with the common
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ev ...
. Then the empirical distribution function is defined as
[
]
:
where
is the
indicator of
event . For a fixed , the indicator
is a
Bernoulli random variable with parameter ; hence
is a
binomial random variable with
mean and
variance . This implies that
is an
unbiased estimator for .
However, in some textbooks, the definition is given as
[Madsen, H.O., Krenk, S., Lind, S.C. (2006) ''Methods of Structural Safety''. Dover Publications. p. 148-149. ]
Mean
The
mean of the empirical distribution is an
unbiased estimator of the mean of the population distribution.
which is more commonly denoted
Variance
The
variance of the empirical distribution times
is an unbiased estimator of the variance of the population distribution, for any distribution of X that has a finite variance.
Mean squared error
The
mean squared error for the empirical distribution is as follows.
Where
is an estimator and
an unknown parameter
Quantiles
For any real number
the notation
(read “ceiling of a”) denotes the least integer greater than or equal to
. For any real number a, the notation
(read “floor of a”) denotes the greatest integer less than or equal to
.
If
is not an integer, then the
-th quantile is unique and is equal to
If
is an integer, then the
-th quantile is not unique and is any real number
such that