In the theory of

probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...

and statistics, the Dvoretzky–Kiefer–Wolfowitz–Massart inequality (DKW inequality) bounds how close an empirically determined distribution function will be to the distribution function from which the empirical samples are drawn. It is named after

Aryeh Dvoretzky Aryeh (Arie) Dvoretzky ( he, אריה דבורצקי, russian: Арье Дворецкий; May 3, 1916 – May 8, 2008) was a Russian-born Israeli mathematician, the winner of the 1973 Israel Prize in Mathematics. He is best known for his ...

, Jack Kiefer, and

Jacob Wolfowitz Jacob Wolfowitz (March 19, 1910 – July 16, 1981) was a Polish-born American Jewish statistician and Shannon Award-winning information theorist. He was the father of former United States Deputy Secretary of Defense and World Bank Group President ...

, who in 1956 proved the inequality :

\Pr\Bigl(\sup_ , F_n(x) - F(x),  > \varepsilon \Bigr) \le Ce^\qquad \text\varepsilon>0.

with an unspecified multiplicative constant ''C'' in front of the exponent on the right-hand side. In 1990,

Pascal Massart Pascal Massart (born 23 January 1958) is a French Statistician. His work focuses on probability and statistics, notably the Dvoretzky–Kiefer–Wolfowitz inequality, the Bousquet inequality, the concentration inequality, and the Efron-Stein ine ...

proved the inequality with the sharp constant ''C'' = 2, confirming a conjecture due to Birnbaum and McCarty. In 2021, Michael Naaman proved the multivariate version of the DKW inequality and generalized Massart's tightness result to the multivariate case, which results in a sharp constant of twice the number of variables, ''C'' = 2k.

The DKW inequality

Given a natural number ''n'', let ''X''₁, ''X''₂, …, ''X_n'' be real-valued

independent and identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usua ...

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...

s with

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ev ...

''F''(·). Let ''F_n'' denote the associated

empirical distribution function In statistics, an empirical distribution function (commonly also called an empirical Cumulative Distribution Function, eCDF) is the distribution function associated with the empirical measure of a sample. This cumulative distribution function ...

defined by :

F_n(x) = \frac1n \sum_^n \mathbf_,\qquad x\in\mathbb.

F(x)

is the ''probability'' that a ''single'' random variable

X

is smaller than

x

, and

F_n(x)

is the ''fraction'' of random variables that are smaller than

x

. The Dvoretzky–Kiefer–Wolfowitz inequality bounds the probability that the

random function In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appea ...

''F_n'' differs from ''F'' by more than a given constant ''ε'' > 0 anywhere on the real line. More precisely, there is the one-sided estimate :

\Pr\Bigl(\sup_ \bigl(F_n(x) - F(x)\bigr) > \varepsilon \Bigr) \le e^\qquad \text\varepsilon\geq\sqrt,

which also implies a two-sided estimate :

\Pr\Bigl(\sup_ , F_n(x) - F(x),  > \varepsilon \Bigr) \le 2e^\qquad \text\varepsilon>0.

. This strengthens the

Glivenko–Cantelli theorem In the theory of probability, the Glivenko–Cantelli theorem (sometimes referred to as the Fundamental Theorem of Statistics), named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli, determines the asymptotic behaviour of the empiri ...

by quantifying the

rate of convergence In numerical analysis, the order of convergence and the rate of convergence of a convergent sequence are quantities that represent how quickly the sequence approaches its limit. A sequence (x_n) that converges to x^* is said to have ''order of co ...

as ''n'' tends to infinity. It also estimates the tail probability of the Kolmogorov–Smirnov statistic. The inequalities above follow from the case where ''F'' corresponds to be the uniform distribution on ,1in view of the fact that ''F_n'' has the same distributions as ''G_n''(''F'') where ''G_n'' is the empirical distribution of ''U''₁, ''U''₂, …, ''U_n'' where these are independent and Uniform(0,1), and noting that :

\sup_ , F_n(x) - F(x),  \; \stackrel \; \sup_ ,  G_n (F(x)) - F(x) ,  \le \sup_ ,  G_n (t) -t ,  ,

with equality if and only if ''F'' is continuous.

Multivariate case

In the multivariate case, ''X''₁, ''X''₂, …, ''X_n'' is an i.i.d. sequence of k-dimensional vectors. If ''F_n'' is the multivariate empirical cdf, then :

\Pr\Bigl(\sup_ , F_n(t) - F(t),  > \varepsilon \Bigr) \le (n+1)ke^

for every ε, n, k>0. The (n+1) term can be replaced with a 2 for any sufficiently large n.

Kaplan-Meier estimator

The Dvoretzky–Kiefer–Wolfowitz inequality is obtained for the Kaplan-Meier estimator which is a right-censored data analog of the empirical distribution function :

\Pr\Bigl(\sqrt n\sup_ , (1-G(t))(F_n(t) - F(t)),  > \varepsilon \Bigr) \le 2.5 e^

for every

\varepsilon > 0

and for some constant

C <\infty

, where

F_n

is the Kaplan-Meier estimator, and

G

is the censoring distribution function.

Building CDF bands

The Dvoretzky–Kiefer–Wolfowitz inequality is one method for generating CDF-based confidence bounds and producing a

confidence band A confidence band is used in statistical analysis to represent the uncertainty in an estimate of a curve or function based on limited or noisy data. Similarly, a prediction band is used to represent the uncertainty about the value of a new data-po ...

, which is sometimes called the Kolmogorov–Smirnov confidence band. The purpose of this confidence interval is to contain the entire CDF at the specified confidence level, while alternative approaches attempt to only achieve the confidence level on each individual point, which can allow for a tighter bound. The DKW bounds runs parallel to, and is equally above and below, the empirical CDF. The equally spaced confidence interval around the empirical CDF allows for different rates of violations across the support of the distribution. In particular, it is more common for a CDF to be outside of the CDF bound estimated using the DKW inequality near the median of the distribution than near the endpoints of the distribution. The interval that contains the true CDF,

F(x)

, with probability

1-\alpha

is often specified as :

F_n(x) - \varepsilon \le F(x) \le F_n(x) + \varepsilon \; \text \varepsilon = \sqrt

which is also a special case of the asymptotic procedure for the multivariate case, whereby one uses the following critical value :

\frac  = \sqrt

for the multivariate test; one may replace 2k with k(n+1) for a test that holds for all n; moreover, the multivariate test described by Naaman can be generalized to account for heterogeneity and dependence.

References

{{DEFAULTSORT:Dvoretzky-Kiefer-Wolfowitz inequality Asymptotic theory (statistics) Statistical inequalities Empirical process

The DKW inequality

Multivariate case

Kaplan-Meier estimator

Building CDF bands

See also

References