probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...

, Boole's inequality, also known as the union bound, says that for any

finite Finite is the opposite of infinite. It may refer to: * Finite number (disambiguation) * Finite set, a set whose cardinality (number of elements) is some natural number * Finite verb Traditionally, a finite verb (from la, fīnītus, past partici ...

countable In mathematics, a set is countable if either it is finite or it can be made in one to one correspondence with the set of natural numbers. Equivalently, a set is ''countable'' if there exists an injective function from it into the natural number ...

set of events, the probability that at least one of the events happens is no greater than the sum of the probabilities of the individual events. This inequality provides an upper bound on the probability of occurrence of at least one of a countable number of events in terms of the individual probabilities of the events. Boole's inequality is named for its discoverer

George Boole George Boole (; 2 November 1815 – 8 December 1864) was a largely self-taught English mathematician, philosopher, and logician, most of whose short career was spent as the first professor of mathematics at Queen's College, Cork in ...

. Formally, for a countable set of events ''A''₁, ''A''₂, ''A''₃, ..., we have :

\left(\bigcup_^ A_i \right) \le \sum_^ (A_i).

In measure-theoretic terms, Boole's inequality follows from the fact that a measure (and certainly any

probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more g ...

) is ''σ''- sub-additive.

Proof

Proof using induction

Boole's inequality may be proved for finite collections of

n

events using the method of induction. For the

n=1

case, it follows that :

\mathbb P(A_1) \le \mathbb P(A_1).

For the case

n

, we have :

\left(\bigcup_^ A_i \right) \le \sum_^ (A_i).

Since

\mathbb P(A \cup B) = \mathbb P(A) + \mathbb(B) - \mathbb(A \cap B),

and because the union operation is

associative In mathematics, the associative property is a property of some binary operations, which means that rearranging the parentheses in an expression will not change the result. In propositional logic, associativity is a valid rule of replacement ...

, we have :

\mathbb\left(\bigcup_^A_i\right) = \mathbb\left(\bigcup_^n A_i\right) + \mathbb(A_) -\mathbb\left(\bigcup_^n A_i \cap A_\right).

Since :

\left(\bigcup_^n A_i \cap A_\right) \ge 0,

by the first axiom of probability, we have :

\mathbb\left(\bigcup_^ A_i \right) \le \mathbb \left(\bigcup_^n A_i\right) + \mathbb(A_),

and therefore :

\mathbb\left(\bigcup_^ A_i \right) \le \sum_^ \mathbb(A_i) + \mathbb(A_) = \sum_^ \mathbb(A_i).

Proof without using induction

For any events in

A_1, A_2, A_3, \dots

in our

probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...

we have :

\mathbb\left(\bigcup_ A_i\right) \leq \sum_i \mathbb P(A_i).

One of the axioms of a probability space is that if

B_1, B_2, B_3, \dots

are ''disjoint'' subsets of the probability space then :

\mathbb\left(\bigcup_ B_i\right) = \sum_i \mathbb P(B_i);

this is called ''countable additivity.'' If

B \subset A,

then

\mathbb P (B) \leq \mathbb P(A).

Indeed, from the axioms of a probability distribution, :

\mathbb P (A) = \mathbb P(B) + \mathbb P(A-B).

Note that both terms on the right are nonnegative. Now we have to modify the sets

A_i

, so they become disjoint. :

B_i = A_i - \bigcup^_ A_j.

So if

B_i \subset A_i

, then we know :

\bigcup^_ B_i = \bigcup^_ A_i.

Therefore, we can deduce the following equation :

\mathbb P\left(\bigcup_iA_i\right) = \mathbb P\left(\bigcup_iB_i\right) = \sum_i \mathbb P (B_i) \leq \sum_i \mathbb P(A_i).

Bonferroni inequalities

Boole's inequality may be generalized to find

upper Upper may refer to: * Shoe upper or ''vamp'', the part of a shoe on the top of the foot * Stimulant, drugs which induce temporary improvements in either mental or physical function or both * ''Upper'', the original film title for the 2013 found fo ...

and lower bounds on the probability of finite unions of events. These bounds are known as Bonferroni inequalities, after Carlo Emilio Bonferroni; see . Define :

S_1 := \sum_^n (A_i),

and :

S_2 := \sum_ (A_i \cap A_j ),

as well as :

S_k := \sum_ (A_\cap \cdots \cap A_ )

for all integers ''k'' in . Then, for odd ''k'' in , :

\left( \bigcup_^n A_i \right) \le \sum_^k (-1)^ S_j,

and for even ''k'' in , :

\left( \bigcup_^n A_i \right) \ge \sum_^k (-1)^ S_j.

Boole's inequality is the initial case, ''k'' = 1. When ''k'' = ''n'', then equality holds and the resulting identity is the inclusion–exclusion principle.

Example

Suppose that you are estimating 5 parameters based on a random sample, and you can control each parameter separately. If you want your estimations of all five parameters to be good with a chance 95%, how should you do to each parameter? Obviously, controlling each parameter good with a chance 95% is not enough because "all are good" is a subset of each event "Estimate ''i'' is good". We can use Boole's Inequality to solve this problem. By finding the complement of event "all fives are good", we can change this question into another condition: ''P( at least one estimation is bad) = 0.05 ≤ P( A₁ is bad) + P( A₂ is bad) + P( A₃ is bad) + P( A₄ is bad) + P( A₅ is bad)'' One way is to make each of them equal to 0.05/5 = 0.01, that is 1%. In another word, you have to guarantee each estimate good to 99%( for example, by constructing a 99% confidence interval) to make sure the total estimation to be good with a chance 95%. This is called Bonferroni Method of simultaneous inference.

Proof

Proof using induction

Proof without using induction

Bonferroni inequalities

Example

See also

References

Other related articles