Applications
General
Logistic regression is used in various fields, including machine learning, most medical fields, and social sciences. For example, the Trauma and Injury Severity Score ( TRISS), which is widely used to predict mortality in injured patients, was originally developed by Boyd ' using logistic regression. Many other medical scales used to assess severity of a patient have been developed using logistic regression. Logistic regression may be used to predict the risk of developing a given disease (e.g.Supervised machine learning
Logistic regression is a supervised machine learning algorithm widely used for binary classification tasks, such as identifying whether an email is spam or not and diagnosing diseases by assessing the presence or absence of specific conditions based on patient test results. This approach utilizes the logistic (or sigmoid) function to transform a linear combination of input features into a probability value ranging between 0 and 1. This probability indicates the likelihood that a given input corresponds to one of two predefined categories. The essential mechanism of logistic regression is grounded in the logistic function's ability to model the probability of binary outcomes accurately. With its distinctive S-shaped curve, the logistic function effectively maps any real-valued number to a value within the 0 to 1 interval. This feature renders it particularly suitable for binary classification tasks, such as sorting emails into "spam" or "not spam". By calculating the probability that the dependent variable will be categorized into a specific group, logistic regression provides a probabilistic framework that supports informed decision-making.Example
Problem
As a simple example, we can use a logistic regression with one explanatory variable and two categories to answer the following question:A group of 20 students spends between 0 and 6 hours studying for an exam. How does the number of hours spent studying affect the probability of the student passing the exam?The reason for using logistic regression for this problem is that the values of the dependent variable, pass and fail, while represented by "1" and "0", are not
Model
Fit
The usual measure ofParameter estimation
Since ''ℓ'' is nonlinear in and , determining their optimum values will require numerical methods. One method of maximizing ''ℓ'' is to require the derivatives of ''ℓ'' with respect to and to be zero: : : and the maximization procedure can be accomplished by solving the above two equations for and , which, again, will generally require the use of numerical methods. The values of and which maximize ''ℓ'' and ''L'' using the above data are found to be: : : which yields a value for ''μ'' and ''s'' of: : :Predictions
The and coefficients may be entered into the logistic regression equation to estimate the probability of passing the exam. For example, for a student who studies 2 hours, entering the value into the equation gives the estimated probability of passing the exam of 0.25: : : Similarly, for a student who studies 4 hours, the estimated probability of passing the exam is 0.87: : : This table shows the estimated probability of passing the exam for several values of hours studying.Model evaluation
The logistic regression analysis gives the following output. By the Wald test, the output indicates that hours studying is significantly associated with the probability of passing the exam (). Rather than the Wald method, the recommended method to calculate the ''p''-value for logistic regression is the likelihood-ratio test (LRT), which for these data give (see below).Generalizations
This simple model is an example of binary logistic regression, and has one explanatory variable and a binary categorical variable which can assume one of two categorical values. Multinomial logistic regression is the generalization of binary logistic regression to include any number of explanatory variables and any number of categories.Background
Definition of the logistic function
An explanation of logistic regression can begin with an explanation of the standardDefinition of the inverse of the logistic function
We can now define theInterpretation of these terms
In the above equations, the terms are as follows: * is the logit function. The equation for illustrates that theDefinition of the odds
The odds of the dependent variable equaling a case (given some linear combination of the predictors) is equivalent to the exponential function of the linear regression expression. This illustrates how theThe odds ratio
For a continuous independent variable the odds ratio can be defined as: :Multiple explanatory variables
If there are multiple explanatory variables, the above expression can be revised to . Then when this is used in the equation relating the log odds of a success to the values of the predictors, the linear regression will be a multiple regression with ''m'' explanators; the parameters for all are all estimated. Again, the more traditional equations are: : and : where usually .Definition
A dataset contains ''N'' points. Each point ''i'' consists of a set of ''m'' input variables ''x''1,''i'' ... ''x''''m,i'' (also called independent variables, explanatory variables, predictor variables, features, or attributes), and a binary outcome variable ''Y''''i'' (also known as aMany explanatory variables, two categories
The above example of binary logistic regression on one explanatory variable can be generalized to binary logistic regression on any number of explanatory variables ''x1, x2,...'' and any number of categorical values . To begin with, we may consider a logistic model with ''M'' explanatory variables, ''x1'', ''x2'' ... ''xM'' and, as in the example above, two categorical values (''y'' = 0 and 1). For the simple binary logistic regression model, we assumed a linear relationship between the predictor variable and the log-odds (also calledMultinomial logistic regression: Many explanatory variables and many categories
In the above cases of two categories (binomial logistic regression), the categories were indexed by "0" and "1", and we had two probabilities: The probability that the outcome was in category 1 was given by and the probability that the outcome was in category 0 was given by . The sum of these probabilities equals 1, which must be true, since "0" and "1" are the only possible categories in this setup. In general, if we have explanatory variables (including ''x0'') and categories, we will need separate probabilities, one for each category, indexed by ''n'', which describe the probability that the categorical outcome ''y'' will be in category ''y=n'', conditional on the vector of covariates x. The sum of these probabilities over all categories must equal 1. Using the mathematically convenient base ''e'', these probabilities are: : for : Each of the probabilities except will have their own set of regression coefficients . It can be seen that, as required, the sum of the over all categories ''n'' is 1. The selection of to be defined in terms of the other probabilities is artificial. Any of the probabilities could have been selected to be so defined. This special value of ''n'' is termed the "pivot index", and the log-odds (''tn'') are expressed in terms of the pivot probability and are again expressed as a linear combination of the explanatory variables: : Note also that for the simple case of , the two-category case is recovered, with and . The log-likelihood that a particular set of ''K'' measurements or data points will be generated by the above probabilities can now be calculated. Indexing each measurement by ''k'', let the ''k''-th set of measured explanatory variables be denoted by and their categorical outcomes be denoted by which can be equal to any integer in ,N The log-likelihood is then: : where is anInterpretations
There are various equivalent specifications and interpretations of logistic regression, which fit into different types of more general models, and allow different generalizations.As a generalized linear model
The particular model used by logistic regression, which distinguishes it from standardAs a latent-variable model
The logistic model has an equivalent formulation as a latent-variable model. This formulation is common in the theory of discrete choice models and makes it easier to extend to certain more complicated models with multiple, correlated choices, as well as to compare logistic regression to the closely related probit model. Imagine that, for each trial ''i'', there is a continuous latent variable ''Y''''i''* (i.e. an unobservedTwo-way latent-variable model
Yet another formulation uses two separate latent variables: : where : where ''EV''1(0,1) is a standard type-1 extreme value distribution: i.e. : Then : This model has a separate latent variable and a separate set of regression coefficients for each possible outcome of the dependent variable. The reason for this separation is that it makes it easy to extend logistic regression to multi-outcome categorical variables, as in the multinomial logit model. In such a model, it is natural to model each possible outcome using a different set of regression coefficients. It is also possible to motivate each of the separate latent variables as the theoreticalExample
: As an example, consider a province-level election where the choice is between a right-of-center party, a left-of-center party, and a secessionist party (e.g. the Parti Québécois, which wantsAs a "log-linear" model
Yet another formulation combines the two-way latent variable formulation above with the original formulation higher up without latent variables, and in the process provides a link to one of the standard formulations of the multinomial logit. Here, instead of writing theAs a single-layer perceptron
The model has an equivalent formulation : This functional form is commonly called a single-layerIn terms of binomial data
A closely related model assumes that each ''i'' is associated not with a single Bernoulli trial but with ''n''''i'' independent identically distributed trials, where the observation ''Y''''i'' is the number of successes observed (the sum of the individual Bernoulli-distributed random variables), and hence follows aModel fitting
Maximum likelihood estimation (MLE)
The regression coefficients are usually estimated using maximum likelihood estimation. Unlike linear regression with normally distributed residuals, it is not possible to find a closed-form expression for the coefficient values that maximize the likelihood function so an iterative process must be used instead; for exampleIteratively reweighted least squares (IRLS)
Binary logistic regression ( or ) can, for example, be calculated using ''iteratively reweighted least squares'' (IRLS), which is equivalent to maximizing the log-likelihood of a Bernoulli distributed process using