Panel (data) analysis is a statistical method, widely used in
social science
Social science is one of the branches of science, devoted to the study of societies and the relationships among individuals within those societies. The term was formerly used to refer to the field of sociology, the original "science of soc ...
,
epidemiology, and
econometrics to analyze two-dimensional (typically cross sectional and longitudinal)
panel data. The data are usually collected over time and over the same individuals and then a
regression
Regression or regressions may refer to:
Science
* Marine regression, coastal advance due to falling sea level, the opposite of marine transgression
* Regression (medicine), a characteristic of diseases to express lighter symptoms or less extent ( ...
is run over these two dimensions.
Multidimensional analysis
In statistics, econometrics and related fields, multidimensional analysis (MDA) is a data analysis process that groups data into two categories: data dimensions and measurements. For example, a data set consisting of the number of wins for a sing ...
is an
econometric
Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics," '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8� ...
method in which data are collected over more than two dimensions (typically, time, individuals, and some third dimension).
A common
panel data regression model looks like
, where
is the
dependent variable
Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or dema ...
,
is the
independent variable
Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or deman ...
,
and
are coefficients,
and
are
indices for individuals and time. The error
is very important in this analysis. Assumptions about the error term determine whether we speak of fixed effects or random effects. In a fixed effects model,
is assumed to vary non-stochastically over
or
making the fixed effects model analogous to a dummy variable model in one dimension. In a random effects model,
is assumed to vary stochastically over
or
requiring special treatment of the error variance matrix.
Panel data analysis has three more-or-less independent approaches:
*independently pooled panels;
*
random effects model
In statistics, a random effects model, also called a variance components model, is a statistical model where the model parameters are random variables. It is a kind of hierarchical linear model, which assumes that the data being analysed are d ...
s;
*
fixed effects models or first differenced models.
The selection between these methods depends upon the objective of the analysis, and the problems concerning the exogeneity of the explanatory variables.
Independently pooled panels
''Key assumption:''
There are no unique attributes of individuals within the measurement set, and no universal effects across time.
Fixed effect models
''Key assumption:''
There are unique attributes of individuals that do not vary over time. That is, the unique attributes for a given individual
are time
invariant. These attributes may or may not be correlated with the individual dependent variables y
i. To test whether fixed effects, rather than random effects, is needed, the
Durbin–Wu–Hausman test The Durbin–Wu–Hausman test (also called Hausman specification test) is a statistical hypothesis test in econometrics named after James Durbin, De-Min Wu, and Jerry A. Hausman. The test evaluates the consistency of an estimator when compared ...
can be used.
Random effect models
''Key assumption:''
There are unique, time constant attributes of individuals that are not correlated with the individual regressors. Pooled OLS can be used to derive unbiased and consistent estimates of parameters even when time constant attributes are present, but random effects will be more efficient.
Fixed effects is a feasible
generalised least squares
In statistics, generalized least squares (GLS) is a technique for estimating the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in a regression model. In these cases, ordina ...
technique which is asymptotically more efficient than Pooled OLS when time constant attributes are present. Random effects adjusts for the serial correlation which is induced by unobserved time constant attributes.
Models with instrumental variables
In the standard random effects (RE) and fixed effects (FE) models, independent variables are assumed to be uncorrelated with error terms. Provided the availability of valid instruments, RE and FE methods extend to the case where some of the explanatory variables are allowed to be endogenous. As in the exogenous setting, RE model with Instrumental Variables (REIV) requires more stringent assumptions than FE model with Instrumental Variables (FEIV) but it tends to be more efficient under appropriate conditions.
[Wooldridge, J.M., Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass.]
To fix ideas, consider the following model:
:
where
is unobserved unit-specific time-invariant effect (call it unobserved effect) and
can be correlated with
for ''s'' possibly different from ''t''. Suppose there exists a set of valid instruments
.
In REIV setting, key assumptions include that
is uncorrelated with
as well as
for
. In fact, for REIV estimator to be efficient, conditions stronger than uncorrelatedness between instruments and unobserved effect are necessary.
On the other hand, FEIV estimator only requires that instruments be exogenous with error terms after conditioning on unobserved effect i.e.