In statistics, the Khmaladze transformation is a mathematical tool used in constructing convenient

goodness of fit The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measure ...

tests for hypothetical distribution functions. More precisely, suppose

X_1,\ldots, X_n

are

i.i.d. In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...

, possibly multi-dimensional, random observations generated from an unknown probability distribution. A classical problem in statistics is to decide how well a given hypothetical distribution function

F

, or a given hypothetical parametric family of distribution functions

\

, fits the set of observations. The Khmaladze transformation allows us to construct goodness of fit tests with desirable properties. It is named after Estate V. Khmaladze. Consider the sequence of empirical distribution functions

F_n

based on a sequence of i.i.d random variables,

X_1,\ldots, X_n

, as ''n'' increases. Suppose

F

is the hypothetical distribution function of each

X_i

. To test whether the choice of

F

is correct or not, statisticians use the normalized difference, :

v_n(x)=\sqrt_n(x)-F(x)

This

v_n

, as a random process in

x

, is called the

empirical process In probability theory, an empirical process is a stochastic process that describes the proportion of objects in a system in a given state. For a process in a discrete state space a population continuous time Markov chain or Markov population model ...

. Various functionals of

v_n

are used as test statistics. The change of the variable

v_n(x)=u_n(t)

t=F(x)

transforms to the so-called uniform empirical process

u_n

. The latter is an empirical processes based on independent random variables

U_i=F(X_i)

, which are uniformly distributed on

,1 /math> if the X_i s do indeed have distribution function F .

This fact was discovered and first utilized by Kolmogorov (1933), Wald and Wolfowitz (1936) and Smirnov (1937) and, especially after Doob (1949) and Anderson and Darling (1952), it led to the standard rule to choose test statistics based on v_n . That is, test statistics \psi(v_n,F) are defined (which possibly depend on the F being tested) in such a way that there exists another statistic \varphi(u_n) derived from the uniform empirical process, such that \psi(v_n,F)=\varphi(u_n) . Examples are

: \sup_x, v_n(x), =\sup_t, u_n(t), ,\quad \sup_x\frac=\sup_t\frac and

: \int_^\infty v_n^2(x) \, dF(x)=\int_0^1 u_n^2(t)\,dt. For all such functionals, their

null distribution In statistical hypothesis testing, the null distribution is the probability distribution of the test statistic when the null hypothesis is true. For example, in an F-test, the null distribution is an F-distribution. Null distribution is a tool scie ...

(under the hypothetical

F

) does not depend on

F

, and can be calculated once and then used to test any

F

. However, it is only rarely that one needs to test a simple hypothesis, when a fixed

F

as a hypothesis is given. Much more often, one needs to verify parametric hypotheses where the hypothetical

F=F_

, depends on some parameters

\theta_n

, which the hypothesis does not specify and which have to be estimated from the sample

X_1,\ldots,X_n

itself. Although the estimators

\hat \theta_n

, most commonly converge to true value of

\theta

, it was discovered that the parametric,Gikhman (1954) or estimated, empirical process :

\hat v_n(x)=\sqrt_n(x)-F_(x) /math>

differs significantly from v_n and that the transformed process \hat u_n(t)=\hat v_n(x), t=F_(x) has a distribution for which the limit distribution, as n\to\infty, is dependent on the parametric form of F_and on the particular estimator \hat\theta_n and, in general, within one

parametric family In mathematics and its applications, a parametric family or a parameterized family is a family of objects (a set of related objects) whose differences depend only on the chosen values for a set of parameters. Common examples are parametrized (fam ...

, on the value of

\theta

. From mid-1950s to the late-1980s, much work was done to clarify the situation and understand the nature of the process

\hat v_n

. In 1981, and then 1987 and 1993, Khmaladze suggested to replace the parametric empirical process

\hat v_n

by its martingale part

w_n

only. :

\hat v_n(x)-K_n(x)=w_n(x)

where

K_n(x)

is the compensator of

\hat v_n(x)

. Then the following properties of

w_n

were established: * Although the form of

K_n

, and therefore, of

w_n

, depends on

F_(x)

, as a function of both

x

and

\theta_n

, the limit distribution of the time transformed process ::

\omega_n(t)=w_n(x), t=F_(x)

: is that of standard Brownian motion on

,1 /math>, i.e., is again standard and independent of the choice of F_.

* The relationship between \hat v_n and w_n and between their limits, is one to one, so that the statistical inference based on \hat v_n or on w_n are equivalent, and in w_n, nothing is lost compared to \hat v_n .
* The construction of innovation martingale w_n could be carried over to the case of vector-valued X_1,\ldots,X_n, giving rise to the definition of the so-called scanning martingales in \mathbb R^d .

For a long time the transformation was, although known, still not used. Later, the work of researchers like Koenker, Stute, Bai,

Koul KOUL (107.7 FM) is a terrestrial radio station, broadcasting a Bilingual rhythmic CHR format. It is licensed to Agua Dulce, Texas, United States, and serves the Corpus Christi area. The station is owned by Minerva R. Lopez. KOUL is simulcas ...

, Koening, and others made it popular in econometrics and other fields of statistics.

See also

References

Further reading