probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

and

statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, the beta-binomial distribution is a family of discrete

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

s on a finite support of non-negative integers arising when the probability of success in each of a fixed or known number of

Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is ...

s is either unknown or random. The beta-binomial distribution is the

binomial distribution In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...

in which the probability of success at each of ''n'' trials is not fixed but randomly drawn from a

beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval

, 1 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...

or (0, 1) in terms of two positive Statistical parameter, parameters, denoted by ''alpha'' (''α'') an ...

. It is frequently used in

Bayesian statistics Bayesian statistics ( or ) is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about ...

, empirical Bayes methods and classical statistics to capture

overdispersion In statistics, overdispersion is the presence of greater variability (statistical dispersion) in a data set than would be expected based on a given statistical model. A common task in applied statistics is choosing a parametric model to fit a giv ...

binomial type In mathematics, a polynomial sequence, i.e., a sequence of polynomials indexed by non-negative integers \left\ in which the index of each polynomial equals its degree, is said to be of binomial type if it satisfies the sequence of identities : ...

distributed data. The beta-binomial is a one-dimensional version of the Dirichlet-multinomial distribution as the binomial and beta distributions are univariate versions of the multinomial and

Dirichlet distribution In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted \operatorname(\boldsymbol\alpha), is a family of continuous multivariate probability distributions parameterized by a vector of pos ...

s respectively. The special case where ''α'' and ''β'' are integers is also known as the negative hypergeometric distribution.

Motivation and derivation

As a compound distribution

The

Beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval

or (0, 1) in terms of two positive Statistical parameter, parameters, denoted by ''alpha'' (''α'') an ...

is a conjugate distribution of the

. This fact leads to an analytically tractable

compound distribution In probability and statistics, a compound probability distribution (also known as a mixture distribution or contagious distribution) is the probability distribution that results from assuming that a random variable is distributed according to some ...

where one can think of the

p

parameter in the binomial distribution as being randomly drawn from a beta distribution. Suppose we were interested in predicting the number of heads,

x

n

future trials. This is given by :

& = \frac . \end

Using the properties of the

beta function In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral : \Beta(z_1,z_2) = \int_0^1 t^ ...

, this can alternatively be written :

f(x\mid n,\alpha,\beta) = \frac \frac

As an urn model

The beta-binomial distribution can also be motivated via an urn model for positive

integer An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...

values of ''α'' and ''β'', known as the Pólya urn model. Specifically, imagine an urn containing ''α'' red balls and ''β'' black balls, where random draws are made. If a red ball is observed, then two red balls are returned to the urn. Likewise, if a black ball is drawn, then two black balls are returned to the urn. If this is repeated ''n'' times, then the probability of observing ''x'' red balls follows a beta-binomial distribution with parameters ''n'', ''α'' and ''β''. By contrast, if the random draws are with simple replacement (no balls over and above the observed ball are added to the urn), then the distribution follows a binomial distribution and if the random draws are made without replacement, the distribution follows a

hypergeometric distribution In probability theory and statistics, the hypergeometric distribution is a Probability distribution#Discrete probability distribution, discrete probability distribution that describes the probability of k successes (random draws for which the ...

Moments and properties

The first three raw moments are ::

\mu_3 & =\frac \end

and the

kurtosis In probability theory and statistics, kurtosis (from , ''kyrtos'' or ''kurtos'', meaning "curved, arching") refers to the degree of “tailedness” in the probability distribution of a real-valued random variable. Similar to skewness, kurtos ...

is ::

\beta_2 = \frac \left (\alpha + \beta)(\alpha + \beta - 1 + 6n) + 3 \alpha\beta(n - 2) + 6n^2 -\frac - \frac \right

Letting

p=\frac \!

we note, suggestively, that the mean can be written as ::

\mu = \frac=np
\!

and the variance as ::

\sigma^2 = \frac
 = np(1-p) \frac = np(1-p) +(n-1)\rho \!

where

\rho= \tfrac\!

. The parameter

\rho \; \!

is known as the "intra class" or "intra cluster" correlation. It is this positive correlation which gives rise to overdispersion. Note that when

n=1

, no information is available to distinguish between the beta and binomial variation, and the two models have equal variances.

Factorial moments

The -th factorial moment of a Beta-binomial random variable is :

= \frac\frac = (n)_r \frac

Point estimates

Method of moments

The method of moments estimates can be gained by noting the first and second moments of the beta-binomial and setting those equal to the sample moments

m_1

and

m_2

. We find ::

\widehat & =\frac. \end

These estimates can be non-sensically negative which is evidence that the data is either undispersed or underdispersed relative to the binomial distribution. In this case, the binomial distribution and the

are alternative candidates respectively.

Maximum likelihood estimation

While closed-form maximum likelihood estimates are impractical, given that the pdf consists of common functions (

gamma function In mathematics, the gamma function (represented by Γ, capital Greek alphabet, Greek letter gamma) is the most common extension of the factorial function to complex numbers. Derived by Daniel Bernoulli, the gamma function \Gamma(z) is defined ...

and/or Beta functions), they can be easily found via direct numerical optimization. Maximum likelihood estimates from empirical data can be computed using general methods for fitting multinomial Pólya distributions, methods for which are described in (Minka 2003). The R package VGAM through the function vglm, via maximum likelihood, facilitates the fitting of glm type models with responses distributed according to the beta-binomial distribution. There is no requirement that n is fixed throughout the observations.

Example: Sex ratio heterogeneity

The following data gives the number of male children among the first 12 children of family size 13 in 6115 families taken from hospital records in 19th century

Saxony Saxony, officially the Free State of Saxony, is a landlocked state of Germany, bordering the states of Brandenburg, Saxony-Anhalt, Thuringia, and Bavaria, as well as the countries of Poland and the Czech Republic. Its capital is Dresden, and ...

(Sokal and Rohlf, p. 59 from Lindsey). The 13th child is ignored to blunt the effect of families non-randomly stopping when a desired gender is reached. The first two sample moments are ::

\begin 
   m_1 & = 6.23\\
   m_2 & = 42.31 \\
     n & = 12
 \end

and therefore the method of moments estimates are ::

\begin 
   \widehat & = 34.1350\\
   \widehat & = 31.6085.
 \end

The

maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...

estimates can be found numerically ::

\begin 
   \widehat\alpha_\mathrm & = 34.09558\\
   \widehat\beta_\mathrm & = 31.5715
 \end

and the maximized log-likelihood is ::

\log \mathcal = -12492.9

from which we find the AIC ::

\mathit=24989.74.

The AIC for the competing binomial model is AIC = 25070.34 and thus we see that the beta-binomial model provides a superior fit to the data i.e. there is evidence for overdispersion. Trivers and Willard postulate a theoretical justification for heterogeneity in gender-proneness among

mammalian A mammal () is a vertebrate animal of the Class (biology), class Mammalia (). Mammals are characterised by the presence of milk-producing mammary glands for feeding their young, a broad neocortex region of the brain, fur or hair, and three ...

offspring. The superior fit is evident especially among the tails

Role in Bayesian statistics

The beta-binomial distribution plays a prominent role in the Bayesian estimation of a Bernoulli success probability

p

which we wish to estimate based on data. Let

\mathbf=\

be a sample of

independent and identically distributed Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...

Bernoulli random variables

X_i \sim \text(p)

. Suppose, our knowledge of

p

- in Bayesian fashion - is uncertain and is modeled by the

prior distribution A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...

p \sim \text(\alpha,\beta)

. If

Y_1=\sum_^ X_i

then through

compounding In the field of pharmacy, compounding (performed in compounding pharmacies) is preparation of custom medications to fit unique needs of patients that cannot be met with mass-produced formulations. This may be done, for example, to provide medic ...

, the prior predictive distribution of :

Y_1 \sim \text(n_1, \alpha,\beta)

. After observing

Y_1

we note that the

posterior distribution The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior ...

for

p

\begin
f(p, \mathbf,\alpha,\beta) & \propto \left(\prod_^ p^(1-p)^ \right)p^(1-p)^\\
 & = Cp^(1-p)^ \\
& = Cp^(1-p)^ 
\end

where

C

is a

normalizing constant In probability theory, a normalizing constant or normalizing factor is used to reduce any probability function to a probability density function with total probability of one. For example, a Gaussian function can be normalized into a probabilit ...

. We recognize the posterior distribution as a

\mathrm(y_1+\alpha,n_1-y_1+\beta)

. Thus, again through compounding, we find that the

posterior predictive distribution In Bayesian statistics, the posterior predictive distribution is the distribution of possible unobserved values conditional on the observed values. Given a set of ''N'' i.i.d. observations \mathbf = \, a new value \tilde will be drawn from a ...

of a sum of a future sample of size

n_2

\mathrm(p)

random variables is :

Y_2 \sim \mathrm(n_2, y_1+\alpha, n_1-y_1+\beta)

Generating random variates

To draw a beta-binomial random variate

X \sim \mathrm(n, \alpha,\beta)

simply draw

p \sim \mathrm(\alpha,\beta)

and then draw

X \sim \mathrm(n,p)

Related distributions

\mathrm(1, \alpha, \beta) \sim \mathrm(p)\,

where

p=\frac\,

. *

\mathrm(n, 1, 1) \sim U(0,n)\,

where

U(a,b)\,

is the

discrete uniform distribution In probability theory and statistics, the discrete uniform distribution is a symmetric probability distribution wherein each of some finite whole number ''n'' of outcome values are equally likely to be observed. Thus every one of the ''n'' out ...

. * If

X \sim \mathrm(n, \alpha, \beta) \,

then

(n-X) \sim \mathrm(n,\beta, \alpha) \,

\lim_ \mathrm(n, ps, (1-p)s) \sim \mathrm(n,p)\,

where

p=\frac\,

and

s=\alpha+\beta\,

and

\mathrm(n,p)\,

is the

. *

\lim_ \mathrm(n, n \lambda , n^2) \sim \mathrm(\lambda)\,

where

\mathrm(\lambda)\,

is the

Poisson distribution In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...

. *

\lim_ \mathrm(n, 1, \frac) \sim \mathrm(p)\,

where

\mathrm(p)\,

is the

geometric distribution In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions: * The probability distribution of the number X of Bernoulli trials needed to get one success, supported on \mathbb = \; * T ...

. *

\lim_ \mathrm(n, r, \frac) \sim \mathrm(r,p)\,

where

\mathrm(r,p)\,

is the

negative binomial distribution In probability theory and statistics, the negative binomial distribution, also called a Pascal distribution, is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Berno ...

References

* Minka, Thomas P. (2003)
Estimating a Dirichlet distribution
Microsoft Technical Report.

External links

* ttp://research.microsoft.com/~minka/software/fastfit/ Fastfitcontains Matlab code for fitting Beta-Binomial distributions (in the form of two-dimensional Pólya distributions) to data. * Interactive graphic
Univariate Distribution Relationships

{{DEFAULTSORT:Beta-Binomial Distribution Discrete distributions Compound probability distributions Conjugate prior distributions

Motivation and derivation

As a compound distribution

As an urn model

Moments and properties

Factorial moments

Point estimates

Method of moments

Maximum likelihood estimation

Example: Sex ratio heterogeneity

Role in Bayesian statistics

Generating random variates

Related distributions

See also

References

External links