In statistics, binomial regression is a

regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...

technique in which the response (often referred to as ''Y'') has a binomial distribution: it is the number of successes in a series of independent

Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is ...

s, where each trial has probability of success . In binomial regression, the probability of a success is related to

explanatory variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or deman ...

s: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables. Binomial regression is closely related to

binary regression In statistics, specifically regression analysis, a binary regression estimates a relationship between one or more explanatory variables and a single output binary variable. Generally the probability of the two alternatives is modeled, instead of s ...

: a binary regression can be considered a binomial regression with

n = 1

, or a regression on ungrouped binary data, while a binomial regression can be considered a regression on grouped binary data (see comparison). Binomial regression models are essentially the same as binary choice models, one type of

discrete choice In economics, discrete choice models, or qualitative choice models, describe, explain, and predict choices between two or more discrete alternatives, such as entering or not entering the labor market, or choosing between modes of transport. Such ...

model: the primary difference is in the theoretical motivation (see comparison). In

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

, binomial regression is considered a special case of probabilistic classification, and thus a generalization of

binary classification Binary classification is the task of classifying the elements of a set into two groups (each called ''class'') on the basis of a classification rule. Typical binary classification problems include: * Medical testing to determine if a patient has c ...

Example application

In one published example of an application of binomial regression,Cox & Snell (1981), Example H
p. 91
/ref> the details were as follows. The observed outcome variable was whether or not a fault occurred in an industrial process. There were two explanatory variables: the first was a simple two-case factor representing whether or not a modified version of the process was used and the second was an ordinary quantitative variable measuring the purity of the material being supplied for the process.

Specification of model

The response variable ''Y'' is assumed to be

binomially distributed In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no quest ...

conditional on the explanatory variables ''X''. The number of trials ''n'' is known, and the probability of success for each trial ''p'' is specified as a function ''θ(X)''. This implies that the conditional expectation and

conditional variance In probability theory and statistics, a conditional variance is the variance of a random variable given the value(s) of one or more other variables. Particularly in econometrics, the conditional variance is also known as the scedastic function or ...

of the observed fraction of successes, ''Y/n'', are :

E(Y/n \mid X) = \theta(X)

\operatorname(Y/n \mid X) = \theta(X) (1 - \theta(X)) / n

The goal of binomial regression is to estimate the function ''θ(X)''. Typically the statistician assumes

\theta(X) = m(\beta^\mathrm T X)

, for a known function ''m'', and estimates ''β''. Common choices for ''m'' include the logistic function. The data are often fitted as a generalised linear model where the predicted values μ are the probabilities that any individual event will result in a success. The

likelihood The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood functi ...

of the predictions is then given by :

L(\boldsymbol\mid Y)=\prod_^n \left ( 1_(\mu_i) + 1_ (1-\mu_i) \right ), \,\!

where ''1_A'' is the indicator function which takes on the value one when the event ''A'' occurs, and zero otherwise: in this formulation, for any given observation ''y_i'', only one of the two terms inside the product contributes, according to whether ''y_i''=0 or 1. The likelihood function is more fully specified by defining the formal parameters ''μ_i'' as parameterised functions of the explanatory variables: this defines the likelihood in terms of a much reduced number of parameters. Fitting of the model is usually achieved by employing the method of

maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stat ...

to determine these parameters. In practice, the use of a formulation as a generalised linear model allows advantage to be taken of certain algorithmic ideas which are applicable across the whole class of more general models but which do not apply to all maximum likelihood problems. Models used in binomial regression can often be extended to multinomial data. There are many methods of generating the values of ''μ'' in systematic ways that allow for interpretation of the model; they are discussed below.

Link functions

There is a requirement that the modelling linking the probabilities μ to the explanatory variables should be of a form which only produces values in the range 0 to 1. Many models can be fitted into the form :

\boldsymbol = g(\boldsymbol) \, .

Here ''η'' is an intermediate variable representing a linear combination, containing the regression parameters, of the explanatory variables. The function ''g'' is the cumulative distribution function (cdf) of some probability distribution. Usually this probability distribution has a support from minus infinity to plus infinity so that any finite value of ''η'' is transformed by the function ''g'' to a value inside the range 0 to 1. In the case of

logistic regression In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression a ...

, the link function is the log of the odds ratio or logistic function. In the case of

probit In probability theory and statistics, the probit function is the quantile function associated with the standard normal distribution. It has applications in data analysis and machine learning, in particular exploratory statistical graphics and s ...

, the link is the cdf of the

normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...

. The

linear probability model In statistics, a linear probability model (LPM) is a special case of a binary regression model. Here the dependent variable for each observation takes values which are either 0 or 1. The probability of observing a 0 or 1 in any one case is treated ...

is not a proper binomial regression specification because predictions need not be in the range of zero to one; it is sometimes used for this type of data when the probability space is where interpretation occurs or when the analyst lacks sufficient sophistication to fit or calculate approximate linearizations of probabilities for interpretation.

Comparison with binary regression

Binomial regression is closely connected with binary regression. If the response is a

binary variable Binary data is data whose unit can take on only two possible states. These are often labelled as 0 and 1 in accordance with the binary numeral system and Boolean algebra. Binary data occurs in many different technical and scientific fields, wher ...

(two possible outcomes), then these alternatives can be coded as 0 or 1 by considering one of the outcomes as "success" and the other as "failure" and considering these as count data: "success" is 1 success out of 1 trial, while "failure" is 0 successes out of 1 trial. This can now be considered a binomial distribution with

n = 1

trial, so a binary regression is a special case of a binomial regression. If these data are grouped (by adding counts), they are no longer binary data, but are count data for each group, and can still be modeled by a binomial regression; the individual binary outcomes are then referred to as "ungrouped data". An advantage of working with grouped data is that one can test the goodness of fit of the model; for example, grouped data may exhibit

overdispersion In statistics, overdispersion is the presence of greater variability (statistical dispersion) in a data set than would be expected based on a given statistical model. A common task in applied statistics is choosing a parametric model to fit a ...

relative to the variance estimated from the ungrouped data.

Comparison with binary choice models

A binary choice model assumes a

latent variable In statistics, latent variables (from Latin: present participle of ''lateo'', “lie hidden”) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or me ...

''U_n'', the utility (or net benefit) that person ''n'' obtains from taking an action (as opposed to not taking the action). The utility the person obtains from taking the action depends on the characteristics of the person, some of which are observed by the researcher and some are not: :

U_n = \boldsymbol\beta \cdot \mathbf + \varepsilon_n

where

\boldsymbol\beta

is a set of

regression coefficient In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is cal ...

s and

\mathbf

is a set of independent variables (also known as "features") describing person ''n'', which may be either discrete " dummy variables" or regular continuous variables.

\varepsilon_n

is a random variable specifying "noise" or "error" in the prediction, assumed to be distributed according to some distribution. Normally, if there is a mean or variance parameter in the distribution, it cannot be identified, so the parameters are set to convenient values — by convention usually mean 0, variance 1. The person takes the action, , if ''U_n'' > 0. The unobserved term, ''ε_n'', is assumed to have a

logistic distribution Logistic may refer to: Mathematics * Logistic function, a sigmoid function used in many fields ** Logistic map, a recurrence relation that sometimes exhibits chaos ** Logistic regression, a statistical model using the logistic function ** Logit, ...

. The specification is written succinctly as: ** **

Y_n = \begin
1, & \text U_n > 0, \\
0, & \text U_n \le 0
\end

** logistic, standard

normal Normal(s) or The Normal(s) may refer to: Film and television * ''Normal'' (2003 film), starring Jessica Lange and Tom Wilkinson * ''Normal'' (2007 film), starring Carrie-Anne Moss, Kevin Zegers, Callum Keith Rennie, and Andrew Airlie * ''Norma ...

, etc. Let us write it slightly differently: ** **

Y_n = \begin
1, & \text U_n > 0, \\
0, & \text U_n \le 0
\end

** logistic, standard

, etc. Here we have made the substitution ''e_n'' = −''ε_n''. This changes a random variable into a slightly different one, defined over a negated domain. As it happens, the error distributions we usually consider (e.g.

, standard

, standard Student's t-distribution, etc.) are symmetric about 0, and hence the distribution over ''e_n'' is identical to the distribution over ''ε_n''. Denote the cumulative distribution function (CDF) of

e

F_e,

and the quantile function (inverse CDF) of

e

F^_e .

Note that ::

&= F_e(\boldsymbol\beta \cdot \mathbf) \end

Since

Y_n

is a

, where

= \Pr(Y_n = 1),

we have :

= F_e(\boldsymbol\beta \cdot \mathbf)

or equivalently :

= \boldsymbol\beta \cdot \mathbf .

Note that this is exactly equivalent to the binomial regression model expressed in the formalism of the generalized linear model. If

e_n \sim \mathcal(0,1),

i.e. distributed as a standard normal distribution, then :

= \boldsymbol\beta \cdot \mathbf

which is exactly a

probit model In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to est ...

. If

e_n \sim \operatorname(0,1),

i.e. distributed as a standard

with mean 0 and

scale parameter In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions. The larger the scale parameter, the more spread out the distribution. Definition If a family o ...

1, then the corresponding quantile function is the

logit function In statistics, the logit ( ) function is the quantile function associated with the standard logistic distribution. It has many uses in data analysis and machine learning, especially in data transformations. Mathematically, the logit is the in ...

, and :

= \boldsymbol\beta \cdot \mathbf

which is exactly a

logit model In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression a ...

. Note that the two different formalisms — generalized linear models (GLM's) and

models — are equivalent in the case of simple binary choice models, but can be extended if differing ways: *GLM's can easily handle arbitrarily distributed

response variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or deman ...

s ( dependent variables), not just categorical variables or

ordinal variable Ordinal data is a categorical, statistical data type where the variables have natural, ordered categories and the distances between the categories are not known. These data exist on an ordinal scale, one of four levels of measurement described b ...

s, which discrete choice models are limited to by their nature. GLM's are also not limited to link functions that are quantile functions of some distribution, unlike the use of an

error variable In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is c ...

, which must by assumption have a probability distribution. *On the other hand, because discrete choice models are described as types of

generative model In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsi ...

s, it is conceptually easier to extend them to complicated situations with multiple, possibly correlated, choices for each person, or other variations.

Latent variable interpretation / derivation

latent variable model A latent variable model is a statistical model that relates a set of observable variables (also called ''manifest variables'' or ''indicators'') to a set of latent variables. It is assumed that the responses on the indicators or manifest variabl ...

involving a binomial observed variable ''Y'' can be constructed such that ''Y'' is related to the latent variable ''Y*'' via :

Y = \begin 
             0, & \mboxY^*>0 \\
             1, & \mboxY^*<0.
           \end

The latent variable ''Y*'' is then related to a set of regression variables ''X'' by the model :

Y^* = X\beta + \epsilon \ .

This results in a binomial regression model. The variance of ''ϵ'' can not be identified and when it is not of interest is often assumed to be equal to one. If ''ϵ'' is normally distributed, then a probit is the appropriate model and if ''ϵ'' is log-Weibull distributed, then a logit is appropriate. If ''ϵ'' is uniformly distributed, then a linear probability model is appropriate.

Notes

References

* *