In statistics, specifically

regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...

, a binary regression estimates a relationship between one or more explanatory variables and a single output binary variable. Generally the probability of the two alternatives is modeled, instead of simply outputting a single value, as in

linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is ...

. Binary regression is usually analyzed as a special case of

binomial regression In statistics, binomial regression is a regression analysis technique in which the response (often referred to as ''Y'') has a binomial distribution: it is the number of successes in a series of independent Bernoulli trials, where each trial ha ...

, with a single outcome (

n = 1

), and one of the two alternatives considered as "success" and coded as 1: the value is the

count Count (feminine: countess) is a historical title of nobility in certain European countries, varying in relative status, generally of middling rank in the hierarchy of nobility. Pine, L. G. ''Titles: How the King Became His Majesty''. New Yor ...

of successes in 1 trial, either 0 or 1. The most common binary regression models are the logit model (

logistic regression In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear function (calculus), linear combination of one or more independent var ...

) and the probit model ( probit regression).

Applications

Binary regression is principally applied either for prediction ( binary classification), or for estimating the association between the explanatory variables and the output. In economics, binary regressions are used to model

binary choice In economics, discrete choice models, or qualitative choice models, describe, explain, and predict choices between two or more discrete alternatives, such as entering or not entering the labor market, or choosing between modes of transport. Su ...

Interpretations

Binary regression models can be interpreted as latent variable models, together with a measurement model; or as probabilistic models, directly modeling the probability.

Latent variable model

The latent variable interpretation has traditionally been used in bioassay, yielding the probit model, where normal variance and a cutoff are assumed. The latent variable interpretation is also used in item response theory (IRT). Formally, the latent variable interpretation posits that the outcome ''y'' is related to a vector of explanatory variables ''x'' by :

y=1^*>0 /math>

where y^*=x\beta +\varepsilon and \varepsilon \mid x\sim G,  is a vector of parameters and ''G'' is a

probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...

. This model can be applied in many economic contexts. For instance, the outcome can be the decision of a manager whether invest to a program,

y^*

is the expected net discounted cash flow and ''x'' is a vector of variables which can affect the cash flow of this program. Then the manager will invest only when she expects the net discounted cash flow to be positive. Often, the

error term In mathematics and statistics, an error term is an additive type of error. Common examples include: * errors and residuals in statistics, e.g. in linear regression In statistics, linear regression is a linear approach for modelling the relati ...

\varepsilon

is assumed to follow a

normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu i ...

conditional on the explanatory variables ''x''. This generates the standard probit model.Bliss, C. I. (1934). "The Method of Probits". Science 79 (2037): 38–39.

Probabilistic model

The simplest direct probabilistic model is the logit model, which models the log-odds as a linear function of the explanatory variable or variables. The logit model is "simplest" in the sense of generalized linear models (GLIM): the log-odds are the natural parameter for the exponential family of the Bernoulli distribution, and thus it is the simplest to use for computations. Another direct probabilistic model is the linear probability model, which models the probability itself as a linear function of the explanatory variables. A drawback of the linear probability model is that, for some values of the explanatory variables, the model will predict probabilities less than zero or greater than one.

References

* * {{statistics-stub Regression analysis

Applications

Interpretations

Latent variable model

Probabilistic model

See also

References