HOME

TheInfoList



OR:

In
probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
and
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, the negative binomial distribution, also called a Pascal distribution, is a
discrete probability distribution In probability theory and statistics, a probability distribution is a function that gives the probabilities of occurrence of possible events for an experiment. It is a mathematical description of a random phenomenon in terms of its sample spa ...
that models the number of failures in a sequence of independent and identically distributed
Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is ...
s before a specified/constant/fixed number of successes r occur. For example, we can define rolling a 6 on some dice as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success (r=3). In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution. An alternative formulation is to model the number of total trials (instead of the number of failures). In fact, for a specified (non-random) number of successes , the number of failures is random because the number of total trials is random. For example, we could use the negative binomial distribution to model the number of days (random) a certain machine works (specified by ) before it breaks down. The negative binomial distribution has a variance \mu /p, with the distribution becoming identical to Poisson in the limit p\to 1 for a given mean \mu (i.e. when the failures are increasingly rare). Here p\in ,1/math> is the success probability of each Bernoulli trial. This can make the distribution a useful overdispersed alternative to the Poisson distribution, for example for a robust modification of
Poisson regression In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable ''Y'' has a Poisson distribution, and assumes the lo ...
. In epidemiology, it has been used to model disease transmission for infectious diseases where the likely number of onward infections may vary considerably from individual to individual and from setting to setting. More generally, it may be appropriate where events have positively correlated occurrences causing a larger
variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
than if the occurrences were independent, due to a positive
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. The sign of the covariance, therefore, shows the tendency in the linear relationship between the variables. If greater values of one ...
term. The term "negative binomial" is likely due to the fact that a certain
binomial coefficient In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers and is written \tbinom. It is the coefficient of the t ...
that appears in the formula for the
probability mass function In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...
of the distribution can be written more simply with negative numbers.


Definitions

Imagine a sequence of independent
Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is ...
s: each trial has two potential outcomes called "success" and "failure." In each trial the probability of success is p and of failure is 1-p. We observe this sequence until a predefined number r of successes occurs. Then the random number of observed failures, X, follows the negative binomial distribution: : X\sim\operatorname(r, p)


Probability mass function

The
probability mass function In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...
of the negative binomial distribution is : f(k; r, p) \equiv \Pr(X = k) = \binom (1-p)^k p^r where is the number of successes, is the number of failures, and is the probability of success on each trial. Here, the quantity in parentheses is the
binomial coefficient In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers and is written \tbinom. It is the coefficient of the t ...
, and is equal to : \binom = \frac = \frac = \frac. Note that is the
Gamma function In mathematics, the gamma function (represented by Γ, capital Greek alphabet, Greek letter gamma) is the most common extension of the factorial function to complex numbers. Derived by Daniel Bernoulli, the gamma function \Gamma(z) is defined ...
. There are failures chosen from trials rather than because the last of the trials is by definition a success. This quantity can alternatively be written in the following manner, explaining the name "negative binomial": : \begin & \frac \\ 0pt= & (-1)^k \frac = (-1)^k\binom. \end Note that by the last expression and the binomial series, for every and q=1-p, : p^ = (1-q)^ = \sum_^\infty \binom(-q)^k = \sum_^\infty \binomq^k hence the terms of the probability mass function indeed add up to one as below. : \sum_^\infty \binom(1-p)^kp^r = p^p^r = 1 To understand the above definition of the probability mass function, note that the probability for every specific sequence of  successes and  failures is , because the outcomes of the trials are supposed to happen independently. Since the -th success always comes last, it remains to choose the  trials with failures out of the remaining trials. The above binomial coefficient, due to its combinatorial interpretation, gives precisely the number of all these sequences of length .


Cumulative distribution function

The
cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...
can be expressed in terms of the regularized incomplete beta function: : F(k; r, p) \equiv \Pr(X\le k) = I_(r, k+1). (This formula is using the same parameterization as in the article's table, with the number of successes, and p=r/(r+\mu) with \mu the mean.) It can also be expressed in terms of the
cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...
of the
binomial distribution In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...
: : F(k; r, p) = F_\text(k;n=k+r,1-p).


Alternative formulations

Some sources may define the negative binomial distribution slightly differently from the primary one here. The most common variations are where the random variable is counting different things. These variations can be seen in the table here: Each of the four definitions of the negative binomial distribution can be expressed in slightly different but equivalent ways. The first alternative formulation is simply an equivalent form of the binomial coefficient, that is: \binom ab = \binom a \quad \text\ 0\leq b\leq a. The second alternate formulation somewhat simplifies the expression by recognizing that the total number of trials is simply the number of successes and failures, that is: n=r+k . These second formulations may be more intuitive to understand, however they are perhaps less practical as they have more terms. * The definition where is the number of trials that occur for a given number of successes is similar to the primary definition, except that the number of trials is given instead of the number of failures. This adds to the value of the random variable, shifting its support and mean. * The definition where is the number of successes (or trials) that occur for a given number of failures is similar to the primary definition used in this article, except that numbers of failures and successes are switched when considering what is being counted and what is given. Note however, that still refers to the probability of "success". * The definition of the negative binomial distribution can be extended to the case where the parameter can take on a positive real value. Although it is impossible to visualize a non-integer number of "failures", we can still formally define the distribution through its probability mass function. The problem of extending the definition to real-valued (positive) boils down to extending the binomial coefficient to its real-valued counterpart, based on the
gamma function In mathematics, the gamma function (represented by Γ, capital Greek alphabet, Greek letter gamma) is the most common extension of the factorial function to complex numbers. Derived by Daniel Bernoulli, the gamma function \Gamma(z) is defined ...
: :: \binom = \frac = \frac : After substituting this expression in the original definition, we say that has a negative binomial (or Pólya) distribution if it has a
probability mass function In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...
: :: f(k; r, p) \equiv \Pr(X = k) = \frac (1-p)^k p^r \quad\textk = 0, 1, 2, \dotsc : Here is a real, positive number. In negative binomial regression, the distribution is specified in terms of its mean, m=\frac, which is then related to explanatory variables as in
linear regression In statistics, linear regression is a statistical model, model that estimates the relationship between a Scalar (mathematics), scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A mode ...
or other
generalized linear model In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and by ...
s. From the expression for the mean , one can derive p=\frac and 1-p=\frac. Then, substituting these expressions in the one for the probability mass function when is real-valued, yields this parametrization of the probability mass function in terms of : : \Pr(X = k) = \frac \left(\frac\right)^r \left(\frac\right)^k \quad\textk = 0, 1, 2, \dotsc The variance can then be written as m+\frac. Some authors prefer to set \alpha = \frac, and express the variance as m+\alpha m^2. In this context, and depending on the author, either the parameter or its reciprocal is referred to as the "dispersion parameter", "
shape parameter In probability theory and statistics, a shape parameter (also known as form parameter) is a kind of numerical parameter of a parametric family of probability distributionsEveritt B.S. (2002) Cambridge Dictionary of Statistics. 2nd Edition. CUP. th ...
" or "
clustering coefficient In graph theory, a clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups ...
", or the "heterogeneity" or "aggregation" parameter. The term "aggregation" is particularly used in ecology when describing counts of individual organisms. Decrease of the aggregation parameter towards zero corresponds to increasing aggregation of the organisms; increase of towards infinity corresponds to absence of aggregation, as can be described by
Poisson regression In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable ''Y'' has a Poisson distribution, and assumes the lo ...
.


Alternative parameterizations

Sometimes the distribution is parameterized in terms of its mean and variance : :: \begin & p =\frac, \\ pt& r =\frac, \\ pt& \Pr(X=k) = \left(1-\frac\right)^k \left(\frac \mu \right)^ \\ & \operatorname(X) = \mu \\ & \operatorname(X) = \sigma^2 . \end Another popular parameterization uses and the failure
odds In probability theory, odds provide a measure of the probability of a particular outcome. Odds are commonly used in gambling and statistics. For example for an event that is 40% probable, one could say that the odds are or When gambling, o ...
: :: \begin & p = \frac \\ & \Pr(X=k) = \left(\frac\right)^k \left(\frac \right)^r \\ & \operatorname(X) = r\beta \\ & \operatorname(X) = r\beta(1+\beta) . \end


Examples


Length of hospital stay

Hospital
length of stay Length of stay (LOS) is the duration of a single episode of hospitalization. patient, Inpatient days are calculated by subtracting day of admission from day of :wikt:discharge, discharge. Analysis A common statistic associated with length of stay ...
is an example of real-world data that can be modelled well with a negative binomial distribution via
negative binomial regression Negative may refer to: Science and mathematics * Negative number * Minus sign (−), the mathematical symbol * Negative mass * Negative energy * Negative charge, one of the two types of electric charge * Negative (electrical polarity), in ele ...
.


Selling candy

Pat Collis is required to sell candy bars to raise money for the 6th grade field trip. Pat is (somewhat harshly) not supposed to return home until five candy bars have been sold. So the child goes door to door, selling candy bars. At each house, there is a 0.6 probability of selling one candy bar and a 0.4 probability of selling nothing. ''What's the probability of selling the last candy bar at the'' -th ''house?'' Successfully selling candy enough times is what defines our stopping criterion (as opposed to failing to sell it), so in this case represents the number of failures and represents the number of successes. Recall that the distribution describes the probability of failures and successes in trials with success on the last trial. Selling five candy bars means getting five successes. The number of trials (i.e. houses) this takes is therefore . The random variable we are interested in is the number of houses, so we substitute into a mass function and obtain the following mass function of the distribution of houses (for ): : f(n) = \; (1-0.4)^5 \; 0.4^ = \; 3^5 \; \frac. ''What's the probability that Pat finishes on the tenth house?'' : f(10) = \frac \approx 0.10033. \, ''What's the probability that Pat finishes on or before reaching the eighth house?'' To finish on or before the eighth house, Pat must finish at the fifth, sixth, seventh, or eighth house. Sum those probabilities: : f(5) = \frac \approx 0.07776 \, : f(6) = \frac \approx 0.15552 \, : f(7) = \frac \approx 0.18662 \, : f(8) = \frac \approx 0.17418 \, :\sum_^8 f(j) = \frac \approx 0.59409. ''What's the probability that Pat exhausts all 30 houses that happen to stand in the neighborhood?'' This can be expressed as the probability that Pat does not finish on the fifth through the thirtieth house: :1-\sum_^ f(j) = 1 - I_(5, 30-5+1) \approx 1 - 0.999999823 = 0.000000177. Because of the rather high probability that Pat will sell to each house (60 percent), the probability of her ''not'' fulfilling her quest is vanishingly slim.


Properties


Expectation

The expected total number of trials needed to see successes is \frac. Thus, the expected number of ''failures'' would be this value, minus the successes: : E operatorname(r, p)= \frac - r = \frac


Expectation of successes

The expected total number of failures in a negative binomial distribution with parameters is . To see this, imagine an experiment simulating the negative binomial is performed many times. That is, a set of trials is performed until successes are obtained, then another set of trials, and then another etc. Write down the number of trials performed in each experiment: and set . Now we would expect about successes in total. Say the experiment was performed times. Then there are successes in total. So we would expect , so . See that is just the average number of trials per experiment. That is what we mean by "expectation". The average number of failures per experiment is . This agrees with the mean given in the box on the right-hand side of this page. A rigorous derivation can be done by representing the negative binomial distribution as the sum of waiting times. Let X_r \sim\operatorname(r, p) with the convention X represents the number of failures observed before r successes with the probability of success being p. And let Y_i \sim Geom(p) where Y_i represents the number of failures before seeing a success. We can think of Y_i as the waiting time (number of failures) between the ith and (i-1)th success. Thus : X_r = Y_1 + Y_2 + \cdots + Y_r. The mean is : E _r= E _1+ E _2+ \cdots + E _r= \frac, which follows from the fact E _i= (1-p)/p.


Variance

When counting the number of failures before the -th success, the variance is . When counting the number of successes before the -th failure, as in alternative formulation (3) above, the variance is .


Relation to the binomial theorem

Suppose is a random variable with a
binomial distribution In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...
with parameters and . Assume , with , then :1=1^n=(p+q)^n. Using Newton's binomial theorem, this can equally be written as: :(p+q)^n=\sum_^\infty \binom p^k q^, in which the upper bound of summation is infinite. In this case, the
binomial coefficient In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers and is written \tbinom. It is the coefficient of the t ...
: \binom = . is defined when is a real number, instead of just a positive
integer An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
. But in our case of the binomial distribution it is zero when . We can then say, for example : (p+q)^=\sum_^\infty \binom p^k q^. Now suppose and we use a negative exponent: :1=p^r\cdot p^=p^r (1-q)^=p^r \sum_^\infty \binom (-q)^k. Then all of the terms are positive, and the term :p^r \binom (-q)^k = \binom p^rq^k is just the probability that the number of failures before the -th success is equal to , provided is an integer. (If is a negative non-integer, so that the exponent is a positive non-integer, then some of the terms in the sum above are negative, so we do not have a probability distribution on the set of all nonnegative integers.) Now we also allow non-integer values of . Recall from above that :The sum of independent negative-binomially distributed random variables and with the same value for parameter is negative-binomially distributed with the same but with -value . This property persists when the definition is thus generalized, and affords a quick way to see that the negative binomial distribution is infinitely divisible.


Recurrence relations

The following recurrence relations hold: For the probability mass function : \begin (k+1) \Pr (X=k+1)-p \Pr (X=k) (k+r)=0, \\ pt\Pr (X=0)=(1-p)^r. \end For the moments m_k = \mathbb E(X^k), : m_ = r P m_k + (P^2 + P) , \quad P:=(1-p)/p, \quad m_0=1. For the cumulants : \kappa_ = (Q-1)Q , \quad Q:=1/p, \quad \kappa_1=r(Q-1).


Related distributions

* The
geometric distribution In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions: * The probability distribution of the number X of Bernoulli trials needed to get one success, supported on \mathbb = \; * T ...
on is a special case of the negative binomial distribution, with ::\operatorname(p) = \operatorname(1,\, p).\, * The negative binomial distribution is a special case of the discrete phase-type distribution. * The negative binomial distribution is a special case of discrete compound Poisson distribution.


Poisson distribution

Consider a sequence of negative binomial random variables where the stopping parameter goes to infinity, while the probability of success in each trial goes to one, in such a way as to keep the mean of the distribution (i.e. the expected number of failures) constant. Denoting this mean as , the parameter will be : \begin \text \quad & \lambda = \frac \quad \Rightarrow \quad p = \frac, \\ \text \quad & \lambda \left( 1 + \frac \right) > \lambda, \quad \text. \end Under this parametrization the probability mass function will be : f(k; r, p) = \frac(1-p)^k p^r = \frac \cdot \frac \cdot \frac Now if we consider the limit as , the second factor will converge to one, and the third to the exponent function: : \lim_ f(k; r, p) = \frac \cdot 1 \cdot \frac, which is the mass function of a Poisson-distributed random variable with expected value . In other words, the alternatively parameterized negative binomial distribution converges to the Poisson distribution and controls the deviation from the Poisson. This makes the negative binomial distribution suitable as a robust alternative to the Poisson, which approaches the Poisson for large , but which has larger variance than the Poisson for small . : \operatorname(\lambda) = \lim_ \operatorname \left(r, \frac\right).


Gamma–Poisson mixture

The negative binomial distribution also arises as a continuous mixture of
Poisson distribution In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...
s (i.e. a compound probability distribution) where the mixing distribution of the Poisson rate is a gamma distribution. That is, we can view the negative binomial as a distribution, where is itself a random variable, distributed as a gamma distribution with shape and scale or correspondingly rate . To display the intuition behind this statement, consider two independent Poisson processes, "Success" and "Failure", with intensities and . Together, the Success and Failure processes are equivalent to a single Poisson process of intensity 1, where an occurrence of the process is a success if a corresponding independent coin toss comes up heads with probability ; otherwise, it is a failure. If is a counting number, the coin tosses show that the count of successes before the -th failure follows a negative binomial distribution with parameters and . The count is also, however, the count of the Success Poisson process at the random time of the -th occurrence in the Failure Poisson process. The Success count follows a Poisson distribution with mean , where is the waiting time for occurrences in a Poisson process of intensity , i.e., is gamma-distributed with shape parameter and intensity . Thus, the negative binomial distribution is equivalent to a Poisson distribution with mean , where the random variate is gamma-distributed with shape parameter and intensity . The preceding paragraph follows, because is gamma-distributed with shape parameter and intensity . The following formal derivation (which does not depend on being a counting number) confirms the intuition. : \begin & \int_0^\infty f_(k) \times f_(\lambda) \, \mathrm\lambda \\ pt= & \int_0^\infty \frac e^ \times \frac 1 \left(\frac \lambda \right)^ e^ \, \left( \frac p \, \right)\mathrm\lambda \\ pt= & \left(\frac\right)^r \frac \int_0^\infty \lambda^ e^ \;\mathrm\lambda \\ pt= & \left(\frac\right)^r \frac \Gamma(r+k) (1-p)^ \int_0^\infty f_(\lambda) \;\mathrm\lambda \\ pt= & \frac \; (1-p)^k \,p^r \\ pt = & f(k; r, p). \end Because of this, the negative binomial distribution is also known as the gamma–Poisson (mixture) distribution. The negative binomial distribution was originally derived as a limiting case of the gamma-Poisson distribution.


Distribution of a sum of geometrically distributed random variables

If is a random variable following the negative binomial distribution with parameters and , and support , then is a sum of independent variables following the
geometric distribution In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions: * The probability distribution of the number X of Bernoulli trials needed to get one success, supported on \mathbb = \; * T ...
(on ) with parameter . As a result of the central limit theorem, (properly scaled and shifted) is therefore approximately normal for sufficiently large . Furthermore, if is a random variable following the
binomial distribution In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...
with parameters and , then : \begin \Pr(Y_r \leq s) & = 1 - I_p(s+1, r) \\ pt& = 1 - I_((s+r)-(r-1), (r-1)+1) \\ pt& = 1 - \Pr(B_ \leq r-1) \\ pt& = \Pr(B_ \geq r) \\ pt& = \Pr(\text s+r \text r \text). \end In this sense, the negative binomial distribution is the "inverse" of the binomial distribution. The sum of independent negative-binomially distributed random variables and with the same value for parameter is negative-binomially distributed with the same but with -value . The negative binomial distribution is infinitely divisible, i.e., if has a negative binomial distribution, then for any positive integer , there exist independent identically distributed random variables whose sum has the same distribution that has.


Representation as compound Poisson distribution

The negative binomial distribution can be represented as a compound Poisson distribution: Let (Y_n)_ denote a sequence of independent and identically distributed random variables, each one having the logarithmic series distribution , with probability mass function : f(k; r, p) = \frac,\qquad k\in. Let be a random variable, independent of the sequence, and suppose that has a
Poisson distribution In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...
with mean . Then the random sum : X=\sum_^N Y_n is -distributed. To prove this, we calculate the
probability generating function In probability theory, the probability generating function of a discrete random variable is a power series representation (the generating function) of the probability mass function of the random variable. Probability generating functions are of ...
of , which is the composition of the probability generating functions and . Using :G_N(z)=\exp(\lambda(z-1)),\qquad z\in\mathbb, and : G_(z)=\frac,\qquad , z, <\frac1p, we obtain : \beginG_X(z) & =G_N(G_(z))\\ pt&=\exp\biggl(\lambda\biggl(\frac-1\biggr)\biggr)\\ pt&=\exp\bigl(-r(\ln(1-pz)-\ln(1-p))\bigr)\\ pt&=\biggl(\frac\biggr)^r,\qquad , z, <\frac1p, \end which is the probability generating function of the distribution. The following table describes four distributions related to the number of successes in a sequence of draws:


(''a'',''b'',0) class of distributions

The negative binomial, along with the Poisson and binomial distributions, is a member of the class of distributions. All three of these distributions are special cases of the Panjer distribution. They are also members of a natural exponential family.


Statistical inference


Parameter estimation


MVUE for ''p''

Suppose is unknown and an experiment is conducted where it is decided ahead of time that sampling will continue until successes are found. A sufficient statistic for the experiment is , the number of failures. In estimating , the minimum variance unbiased estimator is : \widehat=\frac.


Maximum likelihood estimation

When is known, the maximum likelihood estimate of is : \widetilde=\frac, but this is a biased estimate. Its inverse , is an unbiased estimate of , however. When is unknown, the maximum likelihood estimator for and together only exists for samples for which the sample variance is larger than the sample mean. The
likelihood function A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...
for iid observations is :L(r,p)=\prod_^N f(k_i;r,p)\,\! from which we calculate the log-likelihood function :\ell(r,p) = \sum_^N \ln(\Gamma(k_i + r)) - \sum_^N \ln(k_i !) - N\ln(\Gamma(r)) + \sum_^N k_i \ln(1-p) + Nr \ln(p). To find the maximum we take the partial derivatives with respect to and and set them equal to zero: :\frac = -\left sum_^N k_i \frac\right+ Nr \frac = 0 and :\frac = \left sum_^N \psi(k_i + r)\right- N\psi(r) + N\ln(p) = 0 where : \psi(k) = \frac \! is the
digamma function In mathematics, the digamma function is defined as the logarithmic derivative of the gamma function: :\psi(z) = \frac\ln\Gamma(z) = \frac. It is the first of the polygamma functions. This function is Monotonic function, strictly increasing a ...
. Solving the first equation for gives: :p = \frac Substituting this in the second equation gives: :\frac = \left sum_^N \psi(k_i + r)\right- N\psi(r) + N\ln\left(\frac\right) = 0 This equation cannot be solved for in closed form. If a numerical solution is desired, an iterative technique such as
Newton's method In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
can be used. Alternatively, the expectation–maximization algorithm can be used.


Occurrence and applications


Waiting time in a Bernoulli process

Let and be integers with non-negative and positive. In a sequence of independent
Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is ...
s with success probability , the negative binomial gives the probability of successes and failures, with a failure on the last trial. Therefore, the negative binomial distribution represents the probability distribution of the number of successes before the -th failure in a Bernoulli process, with probability of successes on each trial. Consider the following example. Suppose we repeatedly throw a die, and consider a 1 to be a failure. The probability of success on each trial is 5/6. The number of successes before the third failure belongs to the infinite set . That number of successes is a negative-binomially distributed random variable. When we get the probability distribution of number of successes before the first failure (i.e. the probability of the first failure occurring on the -st trial), which is a
geometric distribution In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions: * The probability distribution of the number X of Bernoulli trials needed to get one success, supported on \mathbb = \; * T ...
: : f(k; r, p) = (1-p) \cdot p^k \!


Overdispersed Poisson

The negative binomial distribution, especially in its alternative parameterization described above, can be used as an alternative to the Poisson distribution. It is especially useful for discrete data over an unbounded positive range whose sample
variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
exceeds the sample
mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
. In such cases, the observations are overdispersed with respect to a Poisson distribution, for which the mean is equal to the variance. Hence a Poisson distribution is not an appropriate model. Since the negative binomial distribution has one more parameter than the Poisson, the second parameter can be used to adjust the variance independently of the mean. See Cumulants of some discrete probability distributions. An application of this is to annual counts of
tropical cyclone A tropical cyclone is a rapidly rotating storm system with a low-pressure area, a closed low-level atmospheric circulation, strong winds, and a spiral arrangement of thunderstorms that produce heavy rain and squalls. Depending on its locat ...
s in the
North Atlantic The Atlantic Ocean is the second largest of the world's five oceanic divisions, with an area of about . It covers approximately 17% of Earth's surface and about 24% of its water surface area. During the Age of Discovery, it was known for ...
or to monthly to 6-monthly counts of wintertime
extratropical cyclone Extratropical cyclones, sometimes called mid-latitude cyclones or wave cyclones, are low-pressure areas which, along with the anticyclones of high-pressure areas, drive the weather over much of the Earth. Extratropical cyclones are capable of p ...
s over Europe, for which the variance is greater than the mean. In the case of modest overdispersion, this may produce substantially similar results to an overdispersed Poisson distribution. Negative binomial modeling is widely employed in ecology and biodiversity research for analyzing count data where overdispersion is very common. This is because overdispersion is indicative of biological aggregation, such as species or communities forming clusters. Ignoring overdispersion can lead to significantly inflated model parameters, resulting in misleading statistical inferences. The negative binomial distribution effectively addresses overdispersed counts by permitting the variance to vary quadratically with the mean. An additional dispersion parameter governs the slope of the quadratic term, determining the severity of overdispersion. The model's quadratic mean-variance relationship proves to be a realistic approach for handling overdispersion, as supported by empirical evidence from many studies. Overall, the NB model offers two attractive features: (1) the convenient interpretation of the dispersion parameter as an index of clustering or aggregation, and (2) its tractable form, featuring a closed expression for the probability mass function. In genetics, the negative binomial distribution is commonly used to model data in the form of discrete sequence read counts from high-throughput RNA and DNA sequencing experiments. In epidemiology of infectious diseases, the negative binomial has been used as a better option than the Poisson distribution to model overdispersed counts of secondary infections from one infected case (super-spreading events).


Multiplicity observations (physics)

The negative binomial distribution has been the most effective
statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...
for a broad range of multiplicity observations in
particle collision In particle physics, an event refers to the results just after a fundamental interaction takes place between subatomic particles, occurring in a very short time span, at a well-localized region of space. Because of the uncertainty principle, an eve ...
experiments, e.g., p\bar p,\ hh,\ hA,\ AA,\ e^e^- (See for an overview), and is argued to be a scale-invariant property of matter, providing the best fit for astronomical observations, where it predicts the number of galaxies in a region of space. The phenomenological justification for the effectiveness of the negative binomial distribution in these contexts remained unknown for fifty years, since their first observation in 1973. In 2023, a proof from
first principle In philosophy and science, a first principle is a basic proposition or assumption that cannot be deduced from any other proposition or assumption. First principles in philosophy are from first cause attitudes and taught by Aristotelians, and nuan ...
s was eventually demonstrated by Scott V. Tezlaf, where it was shown that the negative binomial distribution emerges from symmetries in the dynamical equations of a canonical ensemble of particles in
Minkowski space In physics, Minkowski space (or Minkowski spacetime) () is the main mathematical description of spacetime in the absence of gravitation. It combines inertial space and time manifolds into a four-dimensional model. The model helps show how a ...
. Roughly, given an expected number of trials \langle n \rangle and expected number of successes \langle r \rangle, where : \langle \mathcal \rangle - \langle r \rangle = k, \quad \quad \langle p \rangle = \frac \quad\quad \quad \implies \quad\quad \quad \langle \mathcal \rangle = \frac, \quad \quad \langle \rangle = \frac, an
isomorphic In mathematics, an isomorphism is a structure-preserving mapping or morphism between two structures of the same type that can be reversed by an inverse mapping. Two mathematical structures are isomorphic if an isomorphism exists between the ...
set of equations can be identified with the parameters of a relativistic current density of a canonical ensemble of massive particles, via : c^2\langle \rho^2 \rangle - \langle j^2 \rangle = c^2\rho_0^2, \quad \quad \quad \langle \beta^2_v \rangle = \frac \quad \quad \implies \quad \quad c^2\langle \rho^2 \rangle = \frac, \quad \quad \quad \langle j^2 \rangle = \frac, where \rho_0 is the rest
density Density (volumetric mass density or specific mass) is the ratio of a substance's mass to its volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' (or ''d'') can also be u ...
, \langle \rho ^2 \rangle is the relativistic mean square density, \langle j ^2 \rangle is the relativistic mean square current density, and \langle \beta^2_v \rangle=\langle v^2 \rangle /c^2, where \langle v ^2 \rangle is the mean square speed of the particle ensemble and c is the
speed of light The speed of light in vacuum, commonly denoted , is a universal physical constant exactly equal to ). It is exact because, by international agreement, a metre is defined as the length of the path travelled by light in vacuum during a time i ...
—such that one can establish the following bijective map: : c^2\rho_0^2 \mapsto k, \quad \quad \langle \beta^2_v \rangle \mapsto \langle p \rangle, \quad \quad c^2\langle\rho^2 \rangle \mapsto \langle \mathcal \rangle, \quad \quad \langle j^2 \rangle \mapsto \langle r \rangle. A rigorous alternative proof of the above correspondence has also been demonstrated through
quantum mechanics Quantum mechanics is the fundamental physical Scientific theory, theory that describes the behavior of matter and of light; its unusual characteristics typically occur at and below the scale of atoms. Reprinted, Addison-Wesley, 1989, It is ...
via the Feynman path integral.


History

This distribution was first studied in 1713 by Pierre Remond de Montmort in his '' Essay d'analyse sur les jeux de hazard'', as the distribution of the number of trials required in an experiment to obtain a given number of successes.Montmort PR de (1713) Essai d'analyse sur les jeux de hasard. 2nd ed. Quillau, Paris It had previously been mentioned by Pascal.Pascal B (1679) Varia Opera Mathematica. D. Petri de Fermat. Tolosae


See also

* Coupon collector's problem *
Beta negative binomial distribution In probability theory, a beta negative binomial distribution is the probability distribution of a discrete probability distribution, discrete random variable X equal to the number of failures needed to get r successes in a sequence of indepe ...
* Extended negative binomial distribution * Negative multinomial distribution *
Binomial distribution In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...
*
Poisson distribution In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...
* Compound Poisson distribution * Exponential family *
Negative binomial regression Negative may refer to: Science and mathematics * Negative number * Minus sign (−), the mathematical symbol * Negative mass * Negative energy * Negative charge, one of the two types of electric charge * Negative (electrical polarity), in ele ...
* Vector generalized linear model


References

{{DEFAULTSORT:Negative Binomial Distribution Discrete distributions Exponential family distributions Compound probability distributions Factorial and binomial topics Infinitely divisible probability distributions