In
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, a linear probability model (LPM) is a special case of a
binary regression model. Here the
dependent variable
A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical functio ...
for each observation takes values which are either 0 or 1. The probability of observing a 0 or 1 in any one case is treated as depending on one or more
explanatory variables. For the "linear probability model", this relationship is a particularly simple one, and allows the model to be fitted by
linear regression
In statistics, linear regression is a statistical model, model that estimates the relationship between a Scalar (mathematics), scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A mode ...
.
The model assumes that, for a binary outcome (
Bernoulli trial
In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is ...
),
, and its associated vector of explanatory variables,
,
:
For this model,
:
and hence the vector of parameters β can be estimated using
least squares
The method of least squares is a mathematical optimization technique that aims to determine the best fit function by minimizing the sum of the squares of the differences between the observed values and the predicted values of the model. The me ...
. This method of fitting would be inefficient,
and can be improved by adopting an iterative scheme based on
weighted least squares
Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the unequal variance of observations (''heteroscedasticity'') is incorporated into ...
,
[ in which the model from the previous iteration is used to supply estimates of the conditional variances, , which would vary between observations. This approach can be related to fitting the model by ]maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
.[
A drawback of this model is that, unless restrictions are placed on , the estimated coefficients can imply probabilities outside the ]unit interval
In mathematics, the unit interval is the closed interval , that is, the set of all real numbers that are greater than or equal to 0 and less than or equal to 1. It is often denoted ' (capital letter ). In addition to its role in real analysi ...
. For this reason, models such as the logit model
In statistics, the logit ( ) function is the quantile function associated with the standard logistic distribution. It has many uses in data analysis and machine learning, especially in data transformations.
Mathematically, the logit is the ...
or the probit model
In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to es ...
are more commonly used.
Latent-variable formulation
More formally, the LPM can arise from a latent-variable formulation (usually to be found in the econometrics
Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics", '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...
literature), as follows: assume the following regression model with a latent (unobservable) dependent variable:
:
The critical assumption here is that the error term of this regression is a symmetric around zero uniform
A uniform is a variety of costume worn by members of an organization while usually participating in that organization's activity. Modern uniforms are most often worn by armed forces and paramilitary organizations such as police, emergency serv ...
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
, and hence, of mean zero. The cumulative distribution function of here is
Define the indicator variable if , and zero otherwise, and consider the conditional probability
:
:
:
But this is the Linear Probability Model,
:
with the mapping
:
This method is a general device to obtain a conditional probability model of a binary variable: if we assume that the distribution of the error term is logistic, we obtain the logit model
In statistics, the logit ( ) function is the quantile function associated with the standard logistic distribution. It has many uses in data analysis and machine learning, especially in data transformations.
Mathematically, the logit is the ...
, while if we assume that it is the normal, we obtain the probit model
In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to es ...
and, if we assume that it is the logarithm of a Weibull distribution, the complementary log-log model.
See also
* Linear approximation
References
Further reading
*
*
*
* Horrace, William C., and Ronald L. Oaxaca. "Results on the Bias and Inconsistency of Ordinary Least Squares for the Linear Probability Model." Economics Letters, 2006: Vol. 90, P. 321–327
{{DEFAULTSORT:Linear Probability Model
Generalized linear models