In
statistics and
econometrics
Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships.M. Hashem Pesaran (1987). "Econometrics," '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8� ...
, set identification (or partial identification) extends the concept of
identifiability (or "point identification") in
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form, ...
s to situations where the distribution of observable variables is not informative of the exact value of a
parameter
A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
, but instead constrains the parameter to lie in a
strict subset of the parameter space. Statistical models that are set identified arise in a variety of settings in
economics
Economics () is the social science that studies the production, distribution, and consumption of goods and services.
Economics focuses on the behaviour and interactions of economic agents and how economies work. Microeconomics analy ...
, including
game theory and the
Rubin causal model
The Rubin causal model (RCM), also known as the Neyman–Rubin causal model, is an approach to the statistical analysis of cause and effect based on the framework of potential outcomes, named after Donald Rubin. The name "Rubin causal model" ...
.
Though the use of set identification dates to a 1934 article by
Ragnar Frisch, the methods were significantly developed and promoted by
Charles Manski starting in the 1990s. Manski developed a method of worst-case bounds for accounting for
selection bias. Unlike methods that make additional statistical assumptions, such as
Heckman correction, the worst-case bounds rely only on the data to generate a range of supported parameter values.
Definition
Let
be a
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form, ...
where the parameter space
is either finite- or infinite-dimensional. Suppose
is the true parameter value. We say that
is set identified if there exists
such that
; that is, that some parameter values in
are not
observationally equivalent Observational equivalence is the property of two or more underlying entities being indistinguishable on the basis of their observable implications. Thus, for example, two scientific theories are observationally equivalent if all of their empirically ...
to
. In that case, the identified set is the set of parameter values that are observationally equivalent to
.
Example: missing data
This example is due to . Suppose there are two
binary random variables, and . The econometrician is interested in
. There is a
missing data
In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation. Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the data.
M ...
problem, however: can only be observed if
.
By the
law of total probability,
:
The only unknown object is
, which is constrained to lie between 0 and 1. Therefore, the identified set is
:
Given the missing data constraint, the econometrician can only say that
. This makes use of all available information.
Statistical inference
Set estimation cannot rely on the usual tools for statistical inference developed for
point estimation. A literature in statistics and econometrics studies methods for
statistical inference in the context of set-identified models, focusing on constructing
confidence interval
In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as ...
s or
confidence region In statistics, a confidence region is a multi-dimensional generalization of a confidence interval. It is a set of points in an ''n''-dimensional space, often represented as an ellipsoid around a point which is an estimated solution to a problem, al ...
s with appropriate properties. For example, a method developed by (and which describes as complicated) constructs confidence regions that cover the identified set with a given probability.
Notes
References
*
*
*
Further reading
*
*
*
*{{Cite book, publisher = Springer-Verlag, isbn = 978-0-387-00454-9, last = Manski, first = Charles F., author-link = Charles Manski , title = Partial Identification of Probability Distributions, location = New York, date = 2003
Econometric modeling
Estimation theory