statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...

, Poisson regression is a

generalized linear model In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and by ...

form of

regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...

used to model count data and

contingency table In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables. They are heavily used in survey research, business ...

s. Poisson regression assumes the response variable ''Y'' has a

Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...

, and assumes the

logarithm In mathematics, the logarithm is the inverse function to exponentiation. That means the logarithm of a number to the base is the exponent to which must be raised, to produce . For example, since , the ''logarithm base'' 10 ...

of its

expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...

can be modeled by a linear combination of unknown

parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...

s. A Poisson regression model is sometimes known as a

log-linear model A log-linear model is a mathematical model that takes the form of a function whose logarithm equals a linear combination of the parameters of the model, which makes it possible to apply (possibly multivariate) linear regression. That is, it h ...

, especially when used to model contingency tables. Negative binomial regression is a popular generalization of Poisson regression because it loosens the highly restrictive assumption that the variance is equal to the mean made by the Poisson model. The traditional negative binomial regression model is based on the Poisson-gamma mixture distribution. This model is popular because it models the Poisson heterogeneity with a gamma distribution. Poisson regression models are

s with the logarithm as the (canonical) link function, and the

function as the assumed probability distribution of the response.

Regression models

\mathbf \in \mathbb^n

is a vector of

independent variables Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or deman ...

, then the model takes the form :

\log (\operatorname(Y\mid\mathbf))=\alpha + \mathbf' \mathbf,

where

\alpha \in \mathbb

and

\mathbf \in \mathbb^n

. Sometimes this is written more compactly as :

\log (\operatorname(Y\mid\mathbf))=\boldsymbol' \mathbf,\,

where x is now an (''n'' + 1)-dimensional vector consisting of ''n'' independent variables concatenated to the number one. Here ''θ'' is simply ''α'' concatenated to β. Thus, when given a Poisson regression model ''θ'' and an input vector x, the predicted mean of the associated Poisson distribution is given by :

\operatorname(Y\mid\mathbf)=e^.\,

If ''Y''_''i'' are

independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independe ...

observations with corresponding values x_''i'' of the predictor variables, then ''θ'' can be estimated by

maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stat ...

. The maximum-likelihood estimates lack a

closed-form expression In mathematics, a closed-form expression is a mathematical expression that uses a finite number of standard operations. It may contain constants, variables, certain well-known operations (e.g., + − × ÷), and functions (e.g., ''n''th r ...

and must be found by numerical methods. The probability surface for maximum-likelihood Poisson regression is always concave, making Newton–Raphson or other gradient-based methods appropriate estimation techniques.

Maximum likelihood-based parameter estimation

Given a set of parameters ''θ'' and an input vector ''x'', the mean of the predicted

, as stated above, is given by :

\lambda := \operatorname(Y\mid x)=e^,\,

and thus, the Poisson distribution's

probability mass function In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...

is given by :

p(y\mid x;\theta) = \frac e^ = \frac

Now suppose we are given a data set consisting of ''m'' vectors

x_i \in \mathbb^, \, i = 1,\ldots,m

, along with a set of ''m'' values

y_1,\ldots,y_m \in \mathbb

. Then, for a given set of parameters ''θ'', the probability of attaining this particular set of data is given by :

p(y_1,\ldots,y_m\mid x_1,\ldots,x_m;\theta) = \prod_^m \frac.

By the method of

, we wish to find the set of parameters ''θ'' that makes this probability as large as possible. To do this, the equation is first rewritten as a

likelihood function The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood functi ...

in terms of ''θ'': :

L(\theta\mid X,Y) = \prod_^m \frac.

Note that the expression on the

right hand side In mathematics, LHS is informal shorthand for the left-hand side of an equation. Similarly, RHS is the right-hand side. The two sides have the same value, expressed differently, since equality is symmetric.convex optimization Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets (or, equivalently, maximizing concave functions over convex sets). Many classes of convex optimization pr ...

techniques such as gradient descent can be applied to find the optimal value of ''θ''.

Poisson regression in practice

Poisson regression may be appropriate when the dependent variable is a count, for instance of events such as the arrival of a telephone call at a call centre. The events must be independent in the sense that the arrival of one call will not make another more or less likely, but the probability per unit time of events is understood to be related to covariates such as time of day.

"Exposure" and offset

Poisson regression may also be appropriate for rate data, where the rate is a count of events divided by some measure of that unit's ''exposure'' (a particular unit of observation). For example, biologists may count the number of tree species in a forest: events would be tree observations, exposure would be unit area, and rate would be the number of species per unit area. Demographers may model death rates in geographic areas as the count of deaths divided by person−years. More generally, event rates can be calculated as events per unit time, which allows the observation window to vary for each unit. In these examples, exposure is respectively unit area, person−years and unit time. In Poisson regression this is handled as an offset. If the rate is count/exposure, multiplying both sides of the equation by exposure moves it to the right side of the equation. When both sides of the equation are then logged, the final model contains log(exposure) as a term that is added to the regression coefficients. This logged variable, log(exposure), is called the offset variable and enters on the right-hand side of the equation with a parameter estimate (for log(exposure)) constrained to 1. :

\log(\operatorname(Y\mid x)) = \log(\text) + \theta' x

which implies :

\log(\operatorname(Y\mid x)) - \log(\text) = 
       \log\left(\frac\right) = \theta' x

Offset in the case of a GLM in R can be achieved using the offset() function: glm(y ~ offset(log(exposure)) + x, family=poisson(link=log) )

Overdispersion and zero inflation

A characteristic of the

is that its mean is equal to its variance. In certain circumstances, it will be found that the observed

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...

is greater than the mean; this is known as

overdispersion In statistics, overdispersion is the presence of greater variability ( statistical dispersion) in a data set than would be expected based on a given statistical model. A common task in applied statistics is choosing a parametric model to fit a ...

and indicates that the model is not appropriate. A common reason is the omission of relevant explanatory variables, or dependent observations. Under some circumstances, the problem of overdispersion can be solved by using

quasi-likelihood In statistics, quasi-likelihood methods are used to estimate parameters in a statistical model when exact likelihood methods, for example maximum likelihood estimation, are computationally infeasible. Due to the wrong likelihood being used, quasi- ...

estimation or a

negative binomial distribution In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non- ...

instead. Ver Hoef and Boveng described the difference between quasi-Poisson (also called overdispersion with quasi-likelihood) and negative binomial (equivalent to gamma-Poisson) as follows: If ''E''(''Y'') = ''μ'', the quasi-Poisson model assumes var(''Y'') = ''θμ'' while the gamma-Poisson assumes var(''Y'') = ''μ''(1 + ''κμ''), where ''θ'' is the quasi-Poisson overdispersion parameter, and ''κ'' is the shape parameter of the

. For both models, parameters are estimated using

Iteratively reweighted least squares The method of iteratively reweighted least squares (IRLS) is used to solve certain optimization problems with objective functions of the form of a ''p''-norm: :\underset \sum_^n \big, y_i - f_i (\boldsymbol\beta) \big, ^p, by an iterative met ...

. For quasi-Poisson, the weights are ''μ''/''θ''. For negative binomial, the weights are ''μ''/(1 + ''κμ''). With large ''μ'' and substantial extra-Poisson variation, the negative binomial weights are capped at 1/''κ''. Ver Hoef and Boveng discussed an example where they selected between the two by plotting mean squared residuals vs. the mean. Another common problem with Poisson regression is excess zeros: if there are two processes at work, one determining whether there are zero events or any events, and a Poisson process determining how many events there are, there will be more zeros than a Poisson regression would predict. An example would be the distribution of cigarettes smoked in an hour by members of a group where some individuals are non-smokers. Other

s such as the negative binomial model or

zero-inflated model In statistics, a zero-inflated model is a statistical model based on a zero-inflated probability distribution, i.e. a distribution that allows for frequent zero-valued observations. Zero-inflated Poisson One well-known zero-inflated model is Dia ...

may function better in these cases. On the contrary, underdispersion may pose an issue for parameter estimation.

Use in survival analysis

Poisson regression creates proportional hazards models, one class of

survival analysis Survival analysis is a branch of statistics for analyzing the expected duration of time until one event occurs, such as death in biological organisms and failure in mechanical systems. This topic is called reliability theory or reliability analysi ...

: see proportional hazards models for descriptions of Cox models.

Extensions

Regularized Poisson regression

When estimating the parameters for Poisson regression, one typically tries to find values for ''θ'' that maximize the likelihood of an expression of the form :

\sum_^m \log(p(y_i;e^)),

where ''m'' is the number of examples in the data set, and

p(y_i;e^)

is the

of the

with the mean set to

e^

. Regularization can be added to this optimization problem by instead maximizing :

\sum_^m \log(p(y_i;e^)) - \lambda \left\, \theta\right\, _2^2,

for some positive constant

\lambda

. This technique, similar to

ridge regression Ridge regression is a method of estimating the coefficients of multiple-regression models in scenarios where the independent variables are highly correlated. It has been used in many fields including econometrics, chemistry, and engineering. Also ...

, can reduce

overfitting mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfitt ...