The Durbin–Wu–Hausman test (also called Hausman specification test) is a

statistical hypothesis test A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. T ...

econometrics Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics", '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...

named after James Durbin, De-Min Wu, and Jerry A. Hausman. The test evaluates the

consistency In deductive logic, a consistent theory is one that does not lead to a logical contradiction. A theory T is consistent if there is no formula \varphi such that both \varphi and its negation \lnot\varphi are elements of the set of consequences ...

of an estimator when compared to an alternative, less efficient estimator which is already known to be consistent. It helps one evaluate if a statistical model corresponds to the data.

Details

Consider the linear model ''y'' = ''Xb'' + ''e'', where ''y'' is the dependent variable and ''X'' is vector of

regressor A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical function ...

s, ''b'' is a vector of coefficients and ''e'' is the

error term In mathematics and statistics, an error term is an additive type of error. In writing, an error term is an instance of faulty language or grammar. Common examples include: * errors and residuals in statistics, e.g. in linear regression * the error ...

. We have two estimators for ''b'': ''b''₀ and ''b''₁. Under the

null hypothesis The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...

, both of these estimators are

consistent In deductive logic, a consistent theory is one that does not lead to a logical contradiction. A theory T is consistent if there is no formula \varphi such that both \varphi and its negation \lnot\varphi are elements of the set of consequences ...

, but ''b''₁ is efficient (has the smallest asymptotic variance), at least in the class of estimators containing ''b''₀. Under the

alternative hypothesis In statistical hypothesis testing, the alternative hypothesis is one of the proposed propositions in the hypothesis test. In general the goal of hypothesis test is to demonstrate that in the given condition, there is sufficient evidence supporting ...

, ''b''₀ is consistent, whereas ''b''₁ isn't. Then the Wu–Hausman

statistic A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypot ...

is: :

H=(b_-b_)'\big(\operatorname(b_)-\operatorname(b_)\big)^\dagger(b_-b_),

where ^† denotes the Moore–Penrose pseudoinverse. Under the null hypothesis, this statistic has asymptotically the

chi-squared distribution In probability theory and statistics, the \chi^2-distribution with k Degrees of freedom (statistics), degrees of freedom is the distribution of a sum of the squares of k Independence (probability theory), independent standard normal random vari ...

with the number of degrees of freedom equal to the rank of matrix . If we reject the null hypothesis, it means that b₁ is inconsistent. This test can be used to check for the endogeneity of a variable (by comparing

instrumental variable In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to ...

(IV) estimates to

ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression In statistics, linear regression is a statistical model, model that estimates the relationship ...

(OLS) estimates). It can also be used to check the validity of extra instruments by comparing IV estimates using a full set of instruments ''Z'' to IV estimates that use a proper subset of ''Z''. Note that in order for the test to work in the latter case, we must be certain of the validity of the subset of ''Z'' and that subset must have enough instruments to identify the parameters of the equation. Hausman also showed that the covariance between an efficient estimator and the difference of an efficient and inefficient estimator is zero.

Derivation

Assuming joint normality of the estimators. :

\sqrt \begin b_1 -b\\ b_0 -b\end \xrightarrow \mathcal \left(\begin 0 \\ 0 \end, \begin\operatorname(b_1) & \operatorname(b_1,b_0) \\ \operatorname(b_1,b_0) & \operatorname(b_0)  \end\right)

Consider the function :

q=b_0-b_1\Rightarrow \operatornameq=0

By the

delta method In statistics, the delta method is a method of deriving the asymptotic distribution of a random variable. It is applicable when the random variable being considered can be defined as a differentiable function of a random variable which is Asymptoti ...

& \operatorname(q)=\operatorname(b_)+\operatorname(b_)-2\operatorname(b_1,b_0) \end

Using the commonly used result, showed by Hausman, that the covariance of an efficient estimator with its difference from an inefficient estimator is zero yields :

\operatorname(q)=\operatorname(b_0)-\operatorname(b_1)

The chi-squared test is based on the Wald criterion :

(b_1-b_0)'\big(\operatorname(b_0)-\operatorname(b_1)\big)^\dagger(b_1-b_0),

where ^† denotes the Moore–Penrose pseudoinverse and ''K'' denotes the dimension of vector ''b''.

Panel data

The Hausman test can be used to differentiate between fixed effects model and random effects model in

panel analysis Panel (data) analysis is a statistical method, widely used in social science, epidemiology, and econometrics to analyze two-dimensional (typically cross sectional and longitudinal) panel data. The data are usually collected over time and over the s ...

. In this case, Random effects (RE) is preferred under the null hypothesis due to higher efficiency, while under the alternative Fixed effects (FE) is at least as consistent and thus preferred.

Details

Derivation

Panel data

See also

References

Further reading