In
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, a zero-inflated model is a
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...
based on a zero-inflated
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
, i.e. a distribution that allows for frequent zero-valued observations.
Introduction to zero-inflated models
Zero-inflated models are commonly used in the analysis of count data, such as the number of visits a patient makes to the emergency room in one year, or the number of fish caught in one day in one lake.
[
] Count data can take values of 0, 1, 2, … (non-negative integer values).
[
] Other examples of count data are the number of hits recorded by a Geiger counter in one minute, patient days in the hospital, goals scored in a soccer game,
[
] and the number of episodes of hypoglycemia per year for a patient with diabetes.
[
]
For statistical analysis, the distribution of the counts is often represented using a
Poisson distribution
In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...
or a
negative binomial distribution
In probability theory and statistics, the negative binomial distribution, also called a Pascal distribution, is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Berno ...
. Hilbe
notes that "Poisson regression is traditionally conceived of as the basic count model upon which a variety of other count models are based." In a Poisson model, "… the random variable
is the count response and parameter
(lambda) is the mean. Often,
is also called the rate or intensity parameter… In statistical literature,
is also expressed as
(mu) when referring to Poisson and traditional negative binomial models."
In some data, the number of zeros is greater than would be expected using a
Poisson distribution
In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...
or a
negative binomial distribution
In probability theory and statistics, the negative binomial distribution, also called a Pascal distribution, is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Berno ...
. Data with such an excess of zero counts are described as Zero-inflated.
Example histograms of zero-inflated Poisson distributions with mean
of 5 or 10 and proportion of zero inflation
of 0.2 or 0.5 are shown below, based on the R program ZeroInflPoiDistPlots.R from Bilder and Laughlin.
Examples of zero-inflated count data
* Fish counts
"… suppose we recorded the number of fish caught on various lakes in 4-hour fishing trips to Minnesota. Some lakes in Minnesota are too shallow for fish to survive the winter, so fishing in those lakes will yield no catch. On the other hand, even on a lake where fish are plentiful, we may or may not catch any fish due to conditions or our own competence. Thus, the number of fish caught will be zero if the lake does not support fish, and will be zero, one or more if it does."
* Number of wisdom teeth extracted.
The number of wisdom teeth that a person has had extracted can range from 0 to 4. Some individuals, about one-third of the population, do not have any wisdom teeth. For these individuals, the number of wisdom teeth extracted will always be zero. For other individuals, the number extracted will be between 0 and 4, where a 0 indicates that the subject has not yet, and may never, have any of their 4 wisdom teeth extracted.
* Publications by PhD candidates.
[
] Long examined the number of publications by 915 doctoral candidates in biochemistry in the last three years of their PhD studies. The proportion of candidates with zero publications exceeded the number predicted by a Poisson model. "Long
argued that the PhD candidates might fall into two distinct groups: "publishers" (perhaps striving for an academic career) and "non-publishers" (seeking other career paths). One reasonable form of explanation is that the observed zero counts reflect a mixture of the two latent classes – those who simply have not yet published and those who will likely never publish."
[
]
Zero-inflated data as a mixture of two distributions
As the examples above show, zero-inflated data can arise as a
mixture
In chemistry, a mixture is a material made up of two or more different chemical substances which can be separated by physical method. It is an impure substance made up of 2 or more elements or compounds mechanically mixed together in any proporti ...
of two distributions. The first distribution generates zeros. The second distribution, which may be a
Poisson distribution
In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...
, a
negative binomial distribution
In probability theory and statistics, the negative binomial distribution, also called a Pascal distribution, is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Berno ...
or other count distribution, generates counts, some of which may be zeros.
In the statistical literature, different authors may use different names to distinguish zeros from the two distributions. Some authors describe zeros generated by the first (binary) distribution as "structural" and zeros generated by the second (count) distribution as "random".
Other authors use the terminology "immune" and "susceptible" for the binary and count zeros, respectively.
Zero-inflated Poisson

One well-known zero-inflated model is
Diane Lambert's zero-inflated Poisson model, which concerns a random event containing excess zero-count data in unit time. For example, the number of
insurance claim
Insurance is a means of protection from financial loss in which, in exchange for a fee, a party agrees to compensate another party in the event of a certain loss, damage, or injury. It is a form of risk management, primarily used to protect ...
s within a population for a certain type of risk would be zero-inflated by those people who have not taken out insurance against the risk and thus are unable to claim. The zero-inflated Poisson (ZIP) model
mixes two zero generating processes. The first process generates zeros. The second process is governed by a
Poisson distribution
In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...
that generates counts, some of which may be zero. The
mixture distribution
In probability and statistics, a mixture distribution is the probability distribution of a random variable that is derived from a collection of other random variables as follows: first, a random variable is selected by chance from the collection a ...
is described as follows:
:
:
where the outcome variable
has any non-negative integer value,
is the expected Poisson count for the
th individual;
is the probability of extra zeros.
The mean is
and the variance is
.
Estimators of ZIP parameters
The method of moments estimators are given by
:
:
where
is the sample mean and
is the sample variance.
The maximum likelihood estimator can be found by solving the following equation
:
where
is the observed proportion of zeros.
A closed form solution of this equation is given by
:
with
being the main branch of Lambert's W-function and
:
.
Alternatively, the equation can be solved by iteration.
The maximum likelihood estimator for
is given by
:
Related models
In 1994, Greene considered the zero-inflated
negative binomial
In probability theory and statistics, the negative binomial distribution, also called a Pascal distribution, is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Berno ...
(ZINB) model. Daniel B. Hall adapted Lambert's methodology to an upper-bounded count situation, thereby obtaining a zero-inflated binomial (ZIB) model.
Discrete pseudo compound Poisson model
If the count data
is such that the probability of zero is larger than the probability of nonzero, namely
:
then the discrete data
obey discrete pseudo
compound Poisson distribution
In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. ...
.
In fact, let
be the
probability generating function
In probability theory, the probability generating function of a discrete random variable is a power series representation (the generating function) of the probability mass function of the random variable. Probability generating functions are of ...
of
. If
, then
. Then from the
Wiener–Lévy theorem,
has the
probability generating function
In probability theory, the probability generating function of a discrete random variable is a power series representation (the generating function) of the probability mass function of the random variable. Probability generating functions are of ...
of the discrete pseudo
compound Poisson distribution
In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. ...
.
We say that the discrete random variable
satisfying
probability generating function
In probability theory, the probability generating function of a discrete random variable is a power series representation (the generating function) of the probability mass function of the random variable. Probability generating functions are of ...
characterization
:
has a discrete pseudo
compound Poisson distribution
In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. ...
with parameters
:
When all the
are non-negative, it is the discrete
compound Poisson distribution
In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. ...
(non-Poisson case) with
overdispersion
In statistics, overdispersion is the presence of greater variability (statistical dispersion) in a data set than would be expected based on a given statistical model.
A common task in applied statistics is choosing a parametric model to fit a giv ...
property.
See also
*
Poisson distribution
In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...
*
Zero-truncated Poisson distribution
In probability theory, the zero-truncated Poisson distribution (ZTP distribution) is a certain discrete probability distribution whose support is the set of positive integers. This distribution is also known as the conditional Poisson distributi ...
*
Compound Poisson distribution
In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. ...
*
Sparse approximation Sparse approximation (also known as sparse representation) theory deals with sparse solutions for systems of linear equations. Techniques for finding these solutions and exploiting them in applications have found wide use in image processing, signa ...
*
Hurdle model
Software
psclan
brmsR packages
References
{{least squares and regression analysis
Generalized linear models
Categorical data
Poisson point processes