statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, the Bonferroni correction is a method to counteract the multiple comparisons problem.

Background

The method is named for its use of the Bonferroni inequalities. Application of the method to confidence intervals was described by Olive Jean Dunn.

Statistical hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. T ...

is based on rejecting the

null hypothesis The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...

when the likelihood of the observed data would be low if the null hypothesis were true. If multiple hypotheses are tested, the probability of observing a rare event increases, and therefore, the likelihood of incorrectly rejecting a null hypothesis (i.e., making a

Type I error Type I error, or a false positive, is the erroneous rejection of a true null hypothesis in statistical hypothesis testing. A type II error, or a false negative, is the erroneous failure in bringing about appropriate rejection of a false null hy ...

) increases. The Bonferroni correction compensates for that increase by testing each individual hypothesis at a significance level of

\alpha/m

, where

\alpha

is the desired overall alpha level and

m

is the number of hypotheses. For example, if a trial is testing

m = 20

hypotheses with a desired overall

\alpha = 0.05

, then the Bonferroni correction would test each individual hypothesis at

\alpha = 0.05/20 = 0.0025

. The Bonferroni correction can also be applied as a p-value adjustment: Using that approach, instead of adjusting the alpha level, each p-value is multiplied by the number of tests (with adjusted p-values that exceed 1 then being reduced to 1), and the alpha level is left unchanged. The significance decisions using this approach will be the same as when using the alpha-level adjustment approach.

Definition

Let

H_1,\ldots,H_m

be a family of null hypotheses and let

p_1,\ldots,p_m

be their corresponding p-values. Let

m

be the total number of null hypotheses, and let

m_0

be the number of true null hypotheses (which is presumably unknown to the researcher). The

family-wise error rate In statistics, family-wise error rate (FWER) is the probability of making one or more false discoveries, or type I errors when performing multiple hypotheses tests. Familywise and experimentwise error rates John Tukey developed in 1953 the conce ...

(FWER) is the probability of rejecting at least one true

H_

, that is, of making at least one

type I error Type I error, or a false positive, is the erroneous rejection of a true null hypothesis in statistical hypothesis testing. A type II error, or a false negative, is the erroneous failure in bringing about appropriate rejection of a false null hy ...

. The Bonferroni correction rejects the null hypothesis for each

p_i\leq\frac \alpha m

, thereby controlling the FWER at

\leq \alpha

. Proof of this control follows from Boole's inequality, as follows: :

\text = P\left\ \leq\sum_^\left\ \leq  m_0 \frac \alpha m \leq \alpha.

This control does not require any assumptions about dependence among the p-values or about how many of the null hypotheses are true.

Extensions

Generalization

Rather than testing each hypothesis at the

\alpha/m

level, the hypotheses may be tested at any other combination of levels that add up to

\alpha

, provided that the level of each test is decided before looking at the data. For example, for two hypothesis tests, an overall

\alpha

of 0.05 could be maintained by conducting one test at 0.04 and the other at 0.01.

Confidence intervals

The procedure proposed by Dunn can be used to adjust confidence intervals. If one establishes

m

confidence intervals, and wishes to have an overall confidence level of

1-\alpha

, each individual confidence interval can be adjusted to the level of

1-\frac

Continuous problems

When searching for a signal in a continuous parameter space there can also be a problem of multiple comparisons, or look-elsewhere effect. For example, a physicist might be looking to discover a particle of unknown mass by considering a large range of masses; this was the case during the Nobel Prize winning detection of the

Higgs boson The Higgs boson, sometimes called the Higgs particle, is an elementary particle in the Standard Model of particle physics produced by the excited state, quantum excitation of the Higgs field, one of the field (physics), fields in particl ...

. In such cases, one can apply a continuous generalization of the Bonferroni correction by employing Bayesian logic to relate the effective number of trials,

m

, to the prior-to-posterior volume ratio.

Alternatives

There are alternative ways to control the

. For example, the Holm–Bonferroni method and the

Šidák correction In statistics, the Šidák correction, or Dunn–Šidák correction, is a method used to counteract the problem of multiple comparisons. It is a simple method to control the family-wise error rate. When all null hypotheses are true, the method ...

are universally more powerful procedures than the Bonferroni correction, meaning that they are always at least as powerful. But unlike the Bonferroni procedure, these methods do not control the expected number of Type I errors per family (the per-family Type I error rate).

Criticism

With respect to Family-wise error rate (FWER) control, the Bonferroni correction can be conservative if there are a large number of tests and/or the test statistics are positively correlated. Multiple-testing corrections, including the Bonferroni procedure, increase the probability of Type II errors when null hypotheses are false, i.e., they reduce

statistical power In frequentist statistics, power is the probability of detecting a given effect (if that effect actually exists) using a given test in a given context. In typical use, it is a function of the specific test that is used (including the choice of tes ...

References

External links

Bonferroni, Sidak online calculator
{{DEFAULTSORT:Bonferroni Correction Multiple comparisons