V-statistic
   HOME

TheInfoList



OR:

V-statistics are a class of statistics named for
Richard von Mises Richard Edler von Mises (; 19 April 1883 – 14 July 1953) was an Austrian scientist and mathematician who worked on solid mechanics, fluid mechanics, aerodynamics, aeronautics, statistics and probability theory. He held the position of Gordo ...
who developed their asymptotic distribution theory in a fundamental paper in 1947. V-statistics are closely related to
U-statistic In statistical theory, a U-statistic is a class of statistics that is especially important in estimation theory; the letter "U" stands for unbiased. In elementary statistics, U-statistics arise naturally in producing minimum-variance unbiased es ...
s (U for " unbiased") introduced by Wassily Hoeffding in 1948. A V-statistic is a statistical function (of a sample) defined by a particular statistical functional of a probability distribution.


Statistical functions

Statistics that can be represented as functionals T(F_n) of the
empirical distribution function In statistics, an empirical distribution function (commonly also called an empirical Cumulative Distribution Function, eCDF) is the distribution function associated with the empirical measure of a sample. This cumulative distribution function ...
(F_n) are called ''statistical functionals''.
Differentiability In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non- vertical tangent line at each interior point in ...
of the functional ''T'' plays a key role in the von Mises approach; thus von Mises considers ''differentiable statistical functionals''.


Examples of statistical functions

  1. The ''k''-th
    central moment In probability theory and statistics, a central moment is a moment of a probability distribution of a random variable about the random variable's mean; that is, it is the expected value of a specified integer power of the deviation of the random ...
    is the ''functional'' T(F)=\int(x-\mu)^k \, dF(x), where \mu = E /math> is the
    expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
    of ''X''. The associated ''statistical function'' is the sample ''k''-th central moment, : T_n=m_k=T(F_n) = \frac 1n \sum_^n (x_i - \overline x)^k.
  2. The chi-squared goodness-of-fit statistic is a statistical function ''T''(''F''''n''), corresponding to the statistical functional : T(F) = \sum_^k \frac, where ''A''''i'' are the ''k'' cells and ''p''''i'' are the specified probabilities of the cells under the null hypothesis.
  3. The Cramér–von-Mises and Anderson–Darling goodness-of-fit statistics are based on the functional : T(F) = \int (F(x) - F_0(x))^2 \, w(x;F_0) \, dF_0(x), where ''w''(''x''; ''F''0) is a specified weight function and ''F''0 is a specified null distribution. If ''w'' is the identity function then ''T''(''F''''n'') is the well known Cramér–von-Mises goodness-of-fit statistic; if w(x;F_0)= _0(x)(1-F_0(x)) then ''T''(''F''''n'') is the Anderson–Darling statistic.


Representation as a V-statistic

Suppose ''x''1, ..., ''x''''n'' is a sample. In typical applications the statistical function has a representation as the V-statistic : V_ = \frac \sum_^n \cdots \sum_^n h(x_, x_, \dots, x_), where ''h'' is a symmetric kernel function. SerflingSerfling (1980, Section 6.5) discusses how to find the kernel in practice. ''V''''mn'' is called a V-statistic of degree ''m''. A symmetric kernel of degree 2 is a function ''h''(''x'', ''y''), such that ''h''(''x'', ''y'') = ''h''(''y'', ''x'') for all ''x'' and ''y'' in the domain of h. For samples ''x''1, ..., ''x''''n'', the corresponding V-statistic is defined : V_ = \frac \sum_^n \sum_^n h(x_i, x_j).


Example of a V-statistic

  1. An example of a degree-2 V-statistic is the second
    central moment In probability theory and statistics, a central moment is a moment of a probability distribution of a random variable about the random variable's mean; that is, it is the expected value of a specified integer power of the deviation of the random ...
    ''m''2. If ''h''(''x'', ''y'') = (''x'' − ''y'')2/2, the corresponding V-statistic is : V_ = \frac \sum_^n \sum_^n \frac(x_i - x_j)^2 = \frac \sum_^n (x_i - \bar x)^2, which is the maximum likelihood estimator of
    variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
    . With the same kernel, the corresponding
    U-statistic In statistical theory, a U-statistic is a class of statistics that is especially important in estimation theory; the letter "U" stands for unbiased. In elementary statistics, U-statistics arise naturally in producing minimum-variance unbiased es ...
    is the (unbiased) sample variance: :s^2= ^ \sum_ \frac(x_i - x_j)^2 = \frac \sum_^n (x_i - \bar x)^2.


Asymptotic distribution

In examples 1–3, the
asymptotic distribution In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the "limiting" distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing ...
of the statistic is different: in (1) it is normal, in (2) it is chi-squared, and in (3) it is a weighted sum of chi-squared variables. Von Mises' approach is a unifying theory that covers all of the cases above. Informally, the type of
asymptotic distribution In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the "limiting" distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing ...
of a statistical function depends on the order of "degeneracy," which is determined by which term is the first non-vanishing term in the
Taylor expansion In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor seri ...
of the functional ''T''. In case it is the linear term, the limit distribution is normal; otherwise higher order types of distributions arise (under suitable conditions such that a central limit theorem holds). There are a hierarchy of cases parallel to asymptotic theory of
U-statistic In statistical theory, a U-statistic is a class of statistics that is especially important in estimation theory; the letter "U" stands for unbiased. In elementary statistics, U-statistics arise naturally in producing minimum-variance unbiased es ...
s. Let ''A''(''m'') be the property defined by: :''A''(''m''):
  1. Var(''h''(''X''1, ..., ''X''''k'')) = 0 for ''k'' < ''m'', and Var(''h''(''X''1, ..., ''X''''k'')) > 0 for ''k'' = ''m'';
  2. ''n''''m''/2''R''''mn'' tends to zero (in probability). (''R''''mn'' is the remainder term in the Taylor series for ''T''.)
Case ''m'' = 1 (Non-degenerate kernel): If ''A''(1) is true, the statistic is a sample mean and the
Central Limit Theorem In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themsel ...
implies that T(Fn) is asymptotically normal. In the variance example (4), m2 is asymptotically normal with mean \sigma^2 and variance (\mu_4 - \sigma^4)/n, where \mu_4=E(X-E(X))^4. Case ''m'' = 2 (Degenerate kernel): Suppose ''A''(2) is true, and E ^2(X_1,X_2)\infty, \, E, h(X_1,X_1), <\infty, and E (x,X_1)equiv 0. Then nV2,n converges in distribution to a weighted sum of independent chi-squared variables: : n V_ \sum_^\infty \lambda_k Z^2_k, where Z_k are independent standard normal variables and \lambda_k are constants that depend on the distribution ''F'' and the functional ''T''. In this case the
asymptotic distribution In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the "limiting" distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing ...
is called a ''quadratic form of centered Gaussian random variables''. The statistic ''V''2,''n'' is called a ''degenerate kernel V-statistic''. The V-statistic associated with the Cramer–von Mises functional (Example 3) is an example of a degenerate kernel V-statistic.See Lee (1990, p. 160) for the kernel function.


See also

*
U-statistic In statistical theory, a U-statistic is a class of statistics that is especially important in estimation theory; the letter "U" stands for unbiased. In elementary statistics, U-statistics arise naturally in producing minimum-variance unbiased es ...
*
Asymptotic distribution In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the "limiting" distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing ...
* Asymptotic theory (statistics)


Notes


References

* * * * * * * * {{Statistics, inference, collapsed Estimation theory Asymptotic theory (statistics)