statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, the Wald test (named after

Abraham Wald Abraham Wald (; ; , ; – ) was a Hungarian and American mathematician and statistician who contributed to decision theory, geometry and econometrics, and founded the field of sequential analysis. One of his well-known statistical works was ...

) assesses constraints on

statistical parameter In statistics, as opposed to its general use in mathematics, a parameter is any quantity of a statistical population that summarizes or describes an aspect of the population, such as a mean or a standard deviation. If a population exactly follo ...

s based on the weighted distance between the unrestricted estimate and its hypothesized value under the

null hypothesis The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...

, where the weight is the precision of the estimate. Intuitively, the larger this weighted distance, the less likely it is that the constraint is true. While the finite sample distributions of Wald tests are generally unknown, it has an asymptotic χ²-distribution under the null hypothesis, a fact that can be used to determine

statistical significance In statistical hypothesis testing, a result has statistical significance when a result at least as "extreme" would be very infrequent if the null hypothesis were true. More precisely, a study's defined significance level, denoted by \alpha, is the ...

. Together with the Lagrange multiplier test and the

likelihood-ratio test In statistics, the likelihood-ratio test is a hypothesis test that involves comparing the goodness of fit of two competing statistical models, typically one found by maximization over the entire parameter space and another found after imposing ...

, the Wald test is one of three classical approaches to

hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. T ...

. An advantage of the Wald test over the other two is that it only requires the estimation of the unrestricted model, which lowers the computational burden as compared to the likelihood-ratio test. However, a major disadvantage is that (in finite samples) it is not invariant to changes in the representation of the null hypothesis; in other words, algebraically equivalent expressions of non-linear parameter restriction can lead to different values of the test statistic. That is because the Wald statistic is derived from a

Taylor expansion In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor ser ...

, and different ways of writing equivalent nonlinear expressions lead to nontrivial differences in the corresponding Taylor coefficients. Another aberration, known as the Hauck–Donner effect, can occur in binomial models when the estimated (unconstrained) parameter is close to the boundary of the

parameter space The parameter space is the space of all possible parameter values that define a particular mathematical model. It is also sometimes called weight space, and is often a subset of finite-dimensional Euclidean space. In statistics, parameter spaces a ...

—for instance a fitted probability being extremely close to zero or one—which results in the Wald test no longer

monotonically increasing In mathematics, a monotonic function (or monotone function) is a function between ordered sets that preserves or reverses the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of orde ...

in the distance between the unconstrained and constrained parameter.

Mathematical details

Under the Wald test, the estimated

\hat

that was found as the maximizing argument of the unconstrained

likelihood function A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...

is compared with a hypothesized value

\theta_0

. In particular, the squared difference

\hat - \theta_0

is weighted by the curvature of the log-likelihood function.

Test on a single parameter

If the hypothesis involves only a single parameter restriction, then the Wald statistic takes the following form: :

W = \frac

which under the null hypothesis follows an asymptotic χ²-distribution with one degree of freedom. The square root of the single-restriction Wald statistic can be understood as a (pseudo) ''t''-ratio that is, however, not actually ''t''-distributed except for the special case of linear regression with

normally distributed In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x ...

errors. In general, it follows an asymptotic ''z'' distribution. :

\sqrt = \frac

where

\operatorname(\widehat\theta)

is the

standard error The standard error (SE) of a statistic (usually an estimator of a parameter, like the average or mean) is the standard deviation of its sampling distribution or an estimate of that standard deviation. In other words, it is the standard deviati ...

(SE) of the

maximum likelihood estimate In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...

(MLE), the square root of the variance. There are several ways to consistently estimate the variance matrix which in finite samples leads to alternative estimates of standard errors and associated test statistics and ''p''-values. The validity of still getting an asymptotically normal distribution after plugin-in the MLE estimator of

\hat\theta

into the SE relies on

Slutsky's theorem In probability theory, Slutsky's theorem extends some properties of algebraic operations on convergent sequences of real numbers to sequences of random variables. The theorem was named after Eugen Slutsky. Slutsky's theorem is also attributed to ...

Test(s) on multiple parameters

The Wald test can be used to test a single hypothesis on multiple parameters, as well as to test jointly multiple hypotheses on single/multiple parameters. Let

\hat_n

be our sample estimator of ''P'' parameters (i.e.,

\hat_n

is a

P \times 1

vector), which is supposed to follow asymptotically a normal distribution with

covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...

''V'',

\sqrt(\hat_n-\theta)\,\xrightarrow \,N(0, V)

. The test of ''Q'' hypotheses on the ''P'' parameters is expressed with a

Q \times P

matrix ''R'': :

H_0: R\theta=r

H_1: R\theta\neq r

The distribution of the test statistic under the null hypothesis is :

quad \chi^2_Q / Q,

which in turn implies :

quad \chi^2_Q ,

where

\hat_n

is an estimator of the covariance matrix. Suppose

\sqrt(\hat_n-\theta)\,\xrightarrow\, N(0, V)

. Then, by

and by the properties of the

normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...

, multiplying by R has distribution: :

R\sqrt(\hat_n-\theta) =\sqrt(R\hat_n-r)\,\xrightarrow\, N(0, RVR')

Recalling that a quadratic form of normal distribution has a

Chi-squared distribution In probability theory and statistics, the \chi^2-distribution with k Degrees of freedom (statistics), degrees of freedom is the distribution of a sum of the squares of k Independence (probability theory), independent standard normal random vari ...

: :

\sqrt(R\hat_n-r) \,\xrightarrow\, \chi^2_Q

Rearranging ''n'' finally gives: :

(R\hat_n-r) \quad \xrightarrow\quad \chi^2_Q

What if the covariance matrix is not known a-priori and needs to be estimated from the data? If we have a

consistent estimator In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter ''θ''0—having the property that as the number of data points used increases indefinitely, the result ...

\hat_n

V

such that

V^\hat_n

has a determinant that is distributed

\chi^2_

, then by the independence of the covariance estimator and equation above, we have: :

(R\hat_n-r)/Q \quad \xrightarrow\quad F(Q,n-P)

Nonlinear hypothesis

In the standard form, the Wald test is used to test linear hypotheses that can be represented by a single matrix ''R''. If one wishes to test a non-linear hypothesis of the form: :

H_0: c(\theta)=0

H_1: c(\theta)\neq 0

The test statistic becomes: :

c \left (\hat_n \right ) \quad \xrightarrow\quad \chi^2_Q

where

c'(\hat_n)

is the

derivative In mathematics, the derivative is a fundamental tool that quantifies the sensitivity to change of a function's output with respect to its input. The derivative of a function of a single variable at a chosen input value, when it exists, is t ...

of c evaluated at the sample estimator. This result is obtained using the

delta method In statistics, the delta method is a method of deriving the asymptotic distribution of a random variable. It is applicable when the random variable being considered can be defined as a differentiable function of a random variable which is Asymptoti ...

, which uses a first order approximation of the variance.

Non-invariance to re-parameterisations

The fact that one uses an approximation of the variance has the drawback that the Wald statistic is not-invariant to a non-linear transformation/reparametrisation of the hypothesis: it can give different answers to the same question, depending on how the question is phrased. For example, asking whether ''R'' = 1 is the same as asking whether log ''R'' = 0; but the Wald statistic for ''R'' = 1 is not the same as the Wald statistic for log ''R'' = 0 (because there is in general no neat relationship between the standard errors of ''R'' and log ''R'', so it needs to be approximated).

Alternatives to the Wald test

There exist several alternatives to the Wald test, namely the

and the Lagrange multiplier test (also known as the score test). Robert F. Engle showed that these three tests, the Wald test, the

and the Lagrange multiplier test are asymptotically equivalent. Although they are asymptotically equivalent, in finite samples, they could disagree enough to lead to different conclusions. There are several reasons to prefer the likelihood ratio test or the Lagrange multiplier to the Wald test: * Non-invariance: As argued above, the Wald test is not invariant under reparametrization, while the likelihood ratio tests will give exactly the same answer whether we work with ''R'', log ''R'' or any other

monotonic In mathematics, a monotonic function (or monotone function) is a function between ordered sets that preserves or reverses the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of ord ...

transformation of ''R''. * The other reason is that the Wald test uses two approximations (that we know the standard error or

Fisher information In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that models ''X''. Formally, it is the variance ...

and the maximum likelihood estimate), whereas the likelihood ratio test depends only on the ratio of likelihood functions under the null hypothesis and alternative hypothesis. * The Wald test requires an estimate using the maximizing argument, corresponding to the "full" model. In some cases, the model is simpler under the null hypothesis, so that one might prefer to use the score test (also called Lagrange multiplier test), which has the advantage that it can be formulated in situations where the variability of the maximizing element is difficult to estimate or computing the estimate according to the maximum likelihood estimator is difficult; e.g. the Cochran–Mantel–Haenzel test is a score test.

References

External links

Wald test
on th

{{DEFAULTSORT:Wald Test Statistical tests

Mathematical details

Test on a single parameter

Test(s) on multiple parameters

Nonlinear hypothesis

Non-invariance to re-parameterisations

Alternatives to the Wald test

See also

References

Further reading

External links