statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, separation is a phenomenon associated with models for

dichotomous A dichotomy () is a partition of a set, partition of a whole (or a set) into two parts (subsets). In other words, this couple of parts must be * jointly exhaustive: everything must belong to one part or the other, and * mutually exclusive: nothi ...

or categorical outcomes, including logistic and

probit regression In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to e ...

. Separation occurs if the predictor (or a

linear combination In mathematics, a linear combination or superposition is an Expression (mathematics), expression constructed from a Set (mathematics), set of terms by multiplying each term by a constant and adding the results (e.g. a linear combination of ''x'' a ...

of some subset of the predictors) is associated with only one outcome value when the predictor range is split at a certain value.

The phenomenon

For example, if the predictor ''X'' is continuous, and the outcome ''y'' = 1 for all observed ''x'' > 2. If the outcome values are (seemingly) perfectly determined by the predictor (e.g., ''y'' = 0 when ''x'' ≤ 2) then the condition "complete separation" is said to occur. If instead there is some overlap (e.g., ''y'' = 0 when ''x'' < 2, but ''y'' has observed values of 0 and 1 when ''x'' = 2) then "quasi-complete separation" occurs. A 2 × 2 table with an empty (zero) cell is an example of quasi-complete separation.

The problem

This observed form of the data is important because it sometimes causes problems with the estimation of regression coefficients. For example, maximum likelihood (ML) estimation relies on maximization of the likelihood function, where e.g. in case of a

logistic regression In statistics, a logistic model (or logit model) is a statistical model that models the logit, log-odds of an event as a linear function (calculus), linear combination of one or more independent variables. In regression analysis, logistic regres ...

with completely separated data the maximum appears at the parameter space's margin, leading to "infinite" estimates, and, along with that, to problems with providing sensible standard errors. Statistical software will often output an arbitrarily large parameter estimate with a very large standard error.

Possible remedies

An approach to "fix" problems with ML estimation is the use of

regularization Regularization may refer to: * Regularization (linguistics) * Regularization (mathematics) * Regularization (physics) * Regularization (solid modeling) * Regularization Law, an Israeli law intended to retroactively legalize settlements See also ...

(or " continuity corrections"). In particular, in case of a logistic regression problem, the use of ''exact logistic regression'' or ''Firth logistic regression'', a bias-reduction method based on a penalized likelihood, may be an option. Alternatively, one may avoid the problems associated with likelihood maximization by switching to a

Bayesian Thomas Bayes ( ; c. 1701 – 1761) was an English statistician, philosopher, and Presbyterian minister. Bayesian ( or ) may be either any of a range of concepts and approaches that relate to statistical methods based on Bayes' theorem Bayes ...

approach to inference. Within a Bayesian framework, the pathologies arising from likelihood maximization are avoided by the use of

integration Integration may refer to: Biology *Multisensory integration *Path integration * Pre-integration complex, viral genetic material used to insert a viral genome into a host genome *DNA integration, by means of site-specific recombinase technology, ...

rather than maximization, as well as by the use of sensible prior probability distributions.

The phenomenon

The problem

Possible remedies

References

Further reading