probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...

and statistics, the Dirichlet negative multinomial distribution is a multivariate distribution on the non-negative integers. It is a multivariate extension of the beta negative binomial distribution. It is also a generalization of the negative multinomial distribution (NM(''k'', ''p'')) allowing for heterogeneity or

overdispersion In statistics, overdispersion is the presence of greater variability (statistical dispersion) in a data set than would be expected based on a given statistical model. A common task in applied statistics is choosing a parametric model to fit a g ...

to the probability vector. It is used in

quantitative marketing research Quantitative marketing research is the application of quantitative research techniques to the field of marketing research. It has roots in both the positivist view of the world, and the modern marketing viewpoint that marketing is an interactive ...

to flexibly model the number of household transactions across multiple brands. If parameters of the

Dirichlet distribution In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted \operatorname(\boldsymbol\alpha), is a family of continuous multivariate probability distributions parameterized by a vector \bolds ...

are

\boldsymbol

, and if :

X \mid p \sim \operatorname(x_0,\mathbf),

where :

\mathbf \sim \operatorname(\alpha_0,\boldsymbol\alpha),

then the marginal distribution of ''X'' is a Dirichlet negative multinomial distribution: :

X \sim \operatorname(x_0,\alpha_0,\boldsymbol).

In the above,

\operatorname(x_0, \mathbf)

is the negative multinomial distribution and

\operatorname(\alpha_0,\boldsymbol\alpha)

is the

Motivation

Dirichlet negative multinomial as a compound distribution

The Dirichlet distribution is a conjugate distribution to the negative multinomial distribution. This fact leads to an analytically tractable

compound distribution In probability and statistics, a compound probability distribution (also known as a mixture distribution or contagious distribution) is the probability distribution that results from assuming that a random variable is distributed according to some ...

. For a random vector of category counts

\mathbf=(x_1,\dots,x_m)

, distributed according to a negative multinomial distribution, the compound distribution is obtained by integrating on the distribution for p which can be thought of as a

random vector In probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its valu ...

following a Dirichlet distribution: :

\Pr(\mathbf\mid x_0, \alpha_0, \boldsymbol)=\int_\mathrm(\mathbf\mid x_0, \mathbf) \mathrm(\mathbf\mid\alpha_0,\boldsymbol)\textrm\mathbf

\Pr(\mathbf\mid x_0, \alpha_0, \boldsymbol)=  \frac\int_ \prod_^m p_i^\textrm\mathbf

which results in the following formula: :

\Pr(\mathbf\mid x_0, \alpha_0, \boldsymbol)=  \frac

where

\mathbf

and

\boldsymbol\alpha_+

are the

m+1

dimensional vectors created by appending the scalars

x_0

and

\alpha_0

to the

m

dimensional vectors

\mathbf

and

\boldsymbol\alpha

respectively and

\mathrm

is the multivariate version of the

beta function In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral : \Beta(z_1,z_2) = \int_0^1 t^ ...

. We can write this equation explicitly as :

\Pr(\mathbf\mid x_0, \alpha_0, \boldsymbol)=x_0\frac \prod_^m \frac.

Alternative formulations exist. One convenient representationFarewell, Daniel & Farewell, Vernon. (2012). Dirichlet negative multinomial regression for overdispersed correlated count data. Biostatistics (Oxford, England). 14. 10.1093/biostatistics/kxs050. is :

\Pr(\mathbf\mid x_0, \alpha_0, \boldsymbol)= \frac \times \frac \times \frac

where

x_\bullet= x_0+x_1+ \cdots + x_m

and

\alpha_= \alpha_0+\alpha_1+ \cdots + \alpha_m

. This can also be written :

\Pr(\mathbf\mid x_0, \alpha_0, \boldsymbol)=\frac\prod_^m \frac.

Properties

Marginal distributions

To obtain the

marginal distribution In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables ...

over a subset of Dirichlet negative multinomial random variables, one only needs to drop the irrelevant

\alpha_i

's (the variables that one wants to marginalize out) from the

\boldsymbol

vector. The joint distribution of the remaining random variates is

\mathrm(x_0,\alpha_0,\boldsymbol)

where

\boldsymbol

is the vector with the removed

\alpha_i

's. The univariate marginals are said to be beta negative binomially distributed.

Conditional distributions

If ''m''-dimensional x is partitioned as follows :

\mathbf
=
\begin
 \mathbf^ \\
 \mathbf^
\end

\text\begin q \times 1 \\ (m-q) \times 1 \end

and accordingly

\boldsymbol

\boldsymbol\alpha
=
\begin
 \boldsymbol\alpha^ \\
 \boldsymbol\alpha^
\end
\text\begin q \times 1 \\ (m-q) \times 1 \end

then the

conditional distribution In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the c ...

\mathbf^

\mathbf^=\mathbf^

\mathrm(x_0^,\alpha_0^,\boldsymbol\alpha^)

where :

x_0^ = x_0 + \sum_^ x_i^

and :

\alpha_0^ = \alpha_0 + \sum_^ \alpha_i^

. That is, :

\Pr(\mathbf^\mid \mathbf^, x_0,  \alpha_0, \boldsymbol)= \frac\prod_^q\frac

Conditional on the sum

The conditional distribution of a Dirichlet negative multinomial distribution on

\sum_^m x_i = n

Dirichlet-multinomial distribution In probability theory and statistics, the Dirichlet-multinomial distribution is a family of discrete multivariate probability distributions on a finite support of non-negative integers. It is also called the Dirichlet compound multinomial distribu ...

with parameters

n

and

\boldsymbol

. That is :

\Pr(\mathbf \mid  \sum_^m x_i =n, x_0, \alpha_0, \boldsymbol)= \frac
\prod_^m\frac

. Notice that the equation does not depend on

x_0

\alpha_0

Aggregation

If :

X = (X_1, \ldots, X_m)\sim\operatorname(x_0, \alpha_0, \alpha_1,\ldots,\alpha_m)

then, if the random variables with positive subscripts ''i'' and ''j'' are dropped from the vector and replaced by their sum, :

X' = (X_1, \ldots, X_i + X_j, \ldots, X_m)\sim\operatorname \left(x_0, \alpha_0, \alpha_1,\ldots,\alpha_i+\alpha_j,\ldots,\alpha_m \right).

Correlation matrix

For

\alpha_0>2

the entries of the

correlation matrix In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistic ...

are :

\rho(X_i,X_i) = 1.

\rho(X_i,X_j) = \frac =   \sqrt.

Heavy tailed

The Dirichlet negative multinomial is a

heavy tailed distribution In probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded: that is, they have heavier tails than the exponential distribution In probability theory and statistics, the exponential ...

. It does not have a

finite Finite is the opposite of infinite. It may refer to: * Finite number (disambiguation) * Finite set, a set whose cardinality (number of elements) is some natural number * Finite verb Traditionally, a finite verb (from la, fīnītus, past partici ...

mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ari ...

for

\alpha_0 \leq 1

and it has infinite

covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements o ...

for

\alpha_0 \leq 2

. Therefore the moment generating function does not exist.

Applications

Dirichlet negative multinomial as a Pólya urn model

In the case when the

m+2

parameters

x_0,  \alpha_0

and

\boldsymbol

are positive integers the Dirichlet negative multinomial can also be motivated by an

urn model In probability and statistics, an urn problem is an idealized mental exercise in which some objects of real interest (such as atoms, people, cars, etc.) are represented as colored balls in an urn or other container. One pretends to remove one o ...

- or more specifically a basic

Pólya urn model In statistics, a Pólya urn model (also known as a Pólya urn scheme or simply as Pólya's urn), named after George Pólya, is a type of statistical model used as an idealized mental exercise framework, unifying many treatments. In an urn model, ...

. Consider an urn initially containing

\sum_^m

balls of

m+1

various colors including

\alpha_0

red balls (the stopping color). The vector

\boldsymbol

gives the respective counts of the other balls of various

m

non-red colors. At each step of the model, a ball is drawn at random from the urn and replaced, along with one additional ball of the same color. The process is repeated over and over, until

x_0

red colored balls are drawn. The random vector

\mathbf

of observed draws of the other

m

non-red colors are distributed according to a

\mathrm(x_0, \alpha_0, \boldsymbol)

. Note, at the end of the experiment, the urn always contains the fixed number

x_0+\alpha_0

of red balls while containing the random number

\mathbf+\boldsymbol

of the other

m

colors.

References

{{Reflist Multivariate discrete distributions

Motivation

Dirichlet negative multinomial as a compound distribution

Properties

Marginal distributions

Conditional distributions

Conditional on the sum

Aggregation

Correlation matrix

Heavy tailed

Applications

Dirichlet negative multinomial as a Pólya urn model

See also

References