Binomial Sum Variance Inequality
   HOME

TheInfoList



OR:

The binomial sum variance inequality states that the variance of the sum of
binomially distributed In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no quest ...
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...
s will always be less than or equal to the variance of a binomial variable with the same ''n'' and ''p'' parameters. In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
and statistics, the
sum Sum most commonly means the total of two or more numbers added together; see addition. Sum can also refer to: Mathematics * Sum (category theory), the generic concept of summation in mathematics * Sum, the result of summation, the additio ...
of independent binomial random variables is itself a binomial random variable if all the component variables share the same success probability. If success probabilities differ, the probability distribution of the sum is not binomial. The lack of uniformity in success probabilities across independent trials leads to a smaller variance. and is a special case of a more general theorem involving the
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
of convex functions. In some statistical applications, the standard binomial variance estimator can be used even if the component probabilities differ, though with a variance estimate that has an upward
bias Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group ...
.


Inequality statement

Consider the sum, ''Z'', of two independent binomial random variables, ''X'' ~ B(''m''0, ''p''0) and ''Y'' ~ B(''m''1, ''p''1), where ''Z'' = ''X'' + ''Y''. Then, the variance of ''Z'' is less than or equal to its variance under the assumption that ''p''0 = ''p''1, that is, if ''Z'' had a binomial distribution. Symbolically, Var(Z) \leqslant E (1 - \tfrac). We wish to prove that :Var(Z) \leqslant E (1 - \frac) We will prove this inequality by finding an expression for Var(''Z'') and substituting it on the left-hand side, then showing that the inequality always holds. If ''Z'' has a binomial distribution with parameters ''n'' and ''p'', then the
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
of ''Z'' is given by E 'Z''= ''np'' and the variance of ''Z'' is given by Var 'Z''= ''np''(1 – ''p''). Letting ''n'' = ''m''0 + ''m''1 and substituting E 'Z''for ''np'' gives :Var(Z) = E (1 - \frac) The random variables ''X'' and ''Y'' are independent, so the variance of the sum is equal to the sum of the variances, that is :Var(Z) = E (1-\frac) + E (1-\frac) In order to prove the theorem, it is therefore sufficient to prove that :E 1 - \frac) + E 1 - \frac) \leqslant E 1 - \frac) Substituting E 'X''+ E 'Y''for E 'Z''gives :E 1 - \frac) + E 1 - \frac) \leqslant (E E (1 - \frac) Multiplying out the brackets and subtracting E + E from both sides yields :- \frac - \frac \leqslant - \frac Multiplying out the brackets yields :E - \frac + E - \frac \leqslant E + E - \frac Subtracting E and E from both sides and reversing the inequality gives :\frac + \frac \geqslant \frac Expanding the right-hand side gives :\frac + \frac \geqslant \frac Multiplying by m_0 m_1 (m_0+m_1) yields :(m_0m_1+^2)+ (^2+m_0m_1) \geqslant m_0m_1(^2+2E ) Deducting the right-hand side gives the relation :^2 -2m_0m_1E + ^2 \geqslant 0 or equivalently :(m_1E - m_0E ^2 \geqslant 0 The square of a real number is always greater than or equal to zero, so this is true for all independent binomial distributions that X and Y could take. This is sufficient to prove the theorem. Although this proof was developed for the sum of two variables, it is easily generalized to greater than two. Additionally, if the individual success probabilities are known, then the variance is known to take the form : \operatorname(Z) = n \bar (1 - \bar) - ns^2, where s^2 = \frac\sum_^n (p_i-\bar)^2. This expression also implies that the variance is always less than that of the binomial distribution with p=\bar, because the standard expression for the variance is decreased by ''ns''2, a positive number.


Applications

The inequality can be useful in the context of multiple testing, where many statistical hypothesis tests are conducted within a particular study. Each test can be treated as a Bernoulli variable with a success probability ''p''. Consider the total number of positive tests as a random variable denoted by ''S''. This quantity is important in the estimation of false discovery rates (FDR), which quantify uncertainty in the test results. If the
null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
is true for some tests and the
alternative hypothesis In statistical hypothesis testing, the alternative hypothesis is one of the proposed proposition in the hypothesis test. In general the goal of hypothesis test is to demonstrate that in the given condition, there is sufficient evidence supporting ...
is true for other tests, then success probabilities are likely to differ between these two groups. However, the variance inequality theorem states that if the tests are independent, the variance of ''S'' will be no greater than it would be under a binomial distribution.


References

{{reflist Probability theorems Theorems in statistics Statistical inequalities