
In
probability theory and
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, the binomial distribution with parameters ''n'' and ''p'' is the
discrete probability distribution of the number of successes in a sequence of ''n''
independent experiments, each asking a
yes–no question, and each with its own
Boolean
Any kind of logic, function, expression, or theory based on the work of George Boole is considered Boolean.
Related to this, "Boolean" may refer to:
* Boolean data type, a form of data with only two possible values (usually "true" and "false" ...
-valued
outcome
Outcome may refer to:
* Outcome (probability), the result of an experiment in probability theory
* Outcome (game theory), the result of players' decisions in game theory
* ''The Outcome'', a 2005 Spanish film
* An outcome measure (or endpoint) ...
: ''success'' (with probability ''p'') or ''failure'' (with probability
). A single success/failure experiment is also called a
Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a
Bernoulli process
In probability and statistics, a Bernoulli process (named after Jacob Bernoulli) is a finite or infinite sequence of binary random variables, so it is a discrete-time stochastic process that takes only two values, canonically 0 and 1. Th ...
; for a single trial, i.e., ''n'' = 1, the binomial distribution is a
Bernoulli distribution. The binomial distribution is the basis for the popular
binomial test of
statistical significance
In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the p ...
.
The binomial distribution is frequently used to model the number of successes in a sample of size ''n'' drawn
with replacement
In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attempt ...
from a population of size ''N''. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a
hypergeometric distribution, not a binomial one. However, for ''N'' much larger than ''n'', the binomial distribution remains a good approximation, and is widely used.
Definitions
Probability mass function
In general, if the
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
''X'' follows the binomial distribution with parameters ''n''
∈ and ''p'' ∈
,1 we write ''X'' ~ B(''n'', ''p''). The probability of getting exactly ''k'' successes in ''n'' independent Bernoulli trials is given by the
probability mass function
In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...
:
:
for ''k'' = 0, 1, 2, ..., ''n'', where
:
is the
binomial coefficient
In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers and is written \tbinom. It is the coefficient of the t ...
, hence the name of the distribution. The formula can be understood as follows: ''k'' successes occur with probability ''p''
''k'' and ''n'' − ''k'' failures occur with probability
. However, the ''k'' successes can occur anywhere among the ''n'' trials, and there are
different ways of distributing ''k'' successes in a sequence of ''n'' trials.
In creating reference tables for binomial distribution probability, usually the table is filled in up to ''n''/2 values. This is because for ''k'' > ''n''/2, the probability can be calculated by its complement as
:
Looking at the expression ''f''(''k'', ''n'', ''p'') as a function of ''k'', there is a ''k'' value that maximizes it. This ''k'' value can be found by calculating
:
and comparing it to 1. There is always an integer ''M'' that satisfies
:
''f''(''k'', ''n'', ''p'') is monotone increasing for ''k'' < ''M'' and monotone decreasing for ''k'' > ''M'', with the exception of the case where (''n'' + 1)''p'' is an integer. In this case, there are two values for which ''f'' is maximal: (''n'' + 1)''p'' and (''n'' + 1)''p'' − 1. ''M'' is the ''most probable'' outcome (that is, the most likely, although this can still be unlikely overall) of the Bernoulli trials and is called the
mode.
Example
Suppose a
biased coin comes up heads with probability 0.3 when tossed. The probability of seeing exactly 4 heads in 6 tosses is
:
Cumulative distribution function
The
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ev ...
can be expressed as:
:
where
is the "floor" under ''k'', i.e. the
greatest integer
Greatest may refer to:
* ''Greatest!'', a 1959 album by Johnny Cash
* ''Bee Gees Greatest'', a 1979 album by Bee Gees
* ''Greatest'' (The Go-Go's album), 1990
* ''Greatest'' (Duran Duran album), 1998
* Greatest (song), a song by Eminem
* "Greate ...
less than or equal to ''k''.
It can also be represented in terms of the
regularized incomplete beta function
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral
: \Beta(z_1,z_2) = \int_0^1 t^(1 ...
, as follows:
:
which is equivalent to the
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ev ...
of the
-distribution:
:
Some closed-form bounds for the cumulative distribution function are given
below
Below may refer to:
*Earth
*Ground (disambiguation)
*Soil
*Floor
*Bottom (disambiguation)
Bottom may refer to:
Anatomy and sex
* Bottom (BDSM), the partner in a BDSM who takes the passive, receiving, or obedient role, to that of the top or ...
.
Properties
Expected value and variance
If ''X'' ~ ''B''(''n'', ''p''), that is, ''X'' is a binomially distributed random variable, ''n'' being the total number of experiments and ''p'' the probability of each experiment yielding a successful result, then the
expected value
In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
of ''X'' is:
:
This follows from the linearity of the expected value along with the fact that is the sum of identical Bernoulli random variables, each with expected value . In other words, if
are identical (and independent) Bernoulli random variables with parameter , then
and
:
The
variance is:
:
This similarly follows from the fact that the variance of a sum of independent random variables is the sum of the variances.
Higher moments
The first 6
central moments, defined as
, are given by
:
The non-central moments satisfy
:
and in general
:
where
are the
Stirling numbers of the second kind, and
is the
th
falling power of
.
A simple bound
follows by bounding the Binomial moments via the
higher Poisson moments:
::
This shows that if
, then