HOME

TheInfoList



OR:

Lindley's paradox is a
counterintuitive A paradox is a logically self-contradictory statement or a statement that runs contrary to one's expectation. It is a statement that, despite apparently valid reasoning from true or apparently true premises, leads to a seemingly self-contradictor ...
situation in
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
in which the
Bayesian Thomas Bayes ( ; c. 1701 – 1761) was an English statistician, philosopher, and Presbyterian minister. Bayesian ( or ) may be either any of a range of concepts and approaches that relate to statistical methods based on Bayes' theorem Bayes ...
and
frequentist Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or pro ...
approaches to a
hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. T ...
problem give different results for certain choices of the
prior distribution A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...
. The problem of the disagreement between the two approaches was discussed in
Harold Jeffreys Sir Harold Jeffreys, FRS (22 April 1891 – 18 March 1989) was a British geophysicist who made significant contributions to mathematics and statistics. His book, ''Theory of Probability'', which was first published in 1939, played an importan ...
' 1939 textbook; it became known as Lindley's paradox after
Dennis Lindley Dennis Victor Lindley (25 July 1923 – 14 December 2013) was an English statistician, decision theorist and leading advocate of Bayesian statistics. Biography Lindley grew up in the south-west London suburb of Surbiton. He was an only child a ...
called the disagreement a
paradox A paradox is a logically self-contradictory statement or a statement that runs contrary to one's expectation. It is a statement that, despite apparently valid reasoning from true or apparently true premises, leads to a seemingly self-contradictor ...
in a 1957 paper. Although referred to as a ''paradox'', the differing results from the Bayesian and frequentist approaches can be explained as using them to answer fundamentally different questions, rather than actual disagreement between the two methods. Nevertheless, for a large class of priors the differences between the frequentist and Bayesian approach are caused by keeping the significance level fixed: as even Lindley recognized, "the theory does not justify the practice of keeping the significance level fixed" and even "some computations by Prof. Pearson in the discussion to that paper emphasized how the significance level would have to change with the sample size, if the losses and prior probabilities were kept fixed". In fact, if the critical value increases with the sample size suitably fast, then the disagreement between the frequentist and Bayesian approaches becomes negligible as the sample size increases. The paradox continues to be a source of active discussion.


Description of the paradox

The result x of some experiment has two possible explanations hypotheses H_0 and H_1 and some prior distribution \pi representing uncertainty as to which hypothesis is more accurate before taking into account x. Lindley's paradox occurs when # The result x is "significant" by a frequentist test of H_0, indicating sufficient evidence to reject H_0, say, at the 5% level, and # The
posterior probability The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posteri ...
of H_0 given x is high, indicating strong evidence that H_0 is in better agreement with x than H_1. These results can occur at the same time when H_0 is very specific, H_1 more diffuse, and the prior distribution does not strongly favor one or the other, as seen below.


Numerical example

The following numerical example illustrates Lindley's paradox. In a certain city 49,581 boys and 48,870 girls have been born over a certain time period. The observed proportion x of male births is thus / ≈ 0.5036. We assume the fraction of male births is a binomial variable with parameter \theta. We are interested in testing whether \theta is 0.5 or some other value. That is, our null hypothesis is H_0: \theta = 0.5, and the alternative is H_1: \theta \neq 0.5.


Frequentist approach

The frequentist approach to testing H_0 is to compute a
p-value In null-hypothesis significance testing, the ''p''-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small ''p''-value means ...
, the probability of observing a fraction of boys at least as large as x assuming H_0 is true. Because the number of births is very large, we can use a normal approximation for the fraction of male births X \sim N(\mu, \sigma^2), with \mu = np = n\theta = 98\,451 \times 0.5 = 49\,225.5 and \sigma^2 = n\theta (1 - \theta) = 98\,451 \times 0.5 \times 0.5 = 24\,612.75, to compute : \begin P(X \geq x \mid \mu = 49\,225.5) = \int_^ \frac e^ \,du \\ = \int_^ \frac e^ \,du \approx 0.0117. \end We would have been equally surprised if we had seen female births, i.e. x \approx 0.4964, so a frequentist would usually perform a two-sided test, for which the p-value would be p \approx 2 \times 0.0117 = 0.0235. In both cases, the p-value is lower than the significance level α = 5%, so the frequentist approach rejects H_0, as it disagrees with the observed data.


Bayesian approach

Assuming no reason to favor one hypothesis over the other, the Bayesian approach would be to assign prior probabilities \pi(H_0) = \pi(H_1) = 0.5 and a uniform distribution to \theta under H_1, and then to compute the posterior probability of H_0 using
Bayes' theorem Bayes' theorem (alternatively Bayes' law or Bayes' rule, after Thomas Bayes) gives a mathematical rule for inverting Conditional probability, conditional probabilities, allowing one to find the probability of a cause given its effect. For exampl ...
: : P(H_0 \mid k) = \frac. After observing k = 49\,581 boys out of n = 98\,451 births, we can compute the posterior probability of each hypothesis using the
probability mass function In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...
for a binomial variable: : \begin P(k \mid H_0) & = (0.5)^k (1 - 0.5)^ \approx 1.95 \times 10^, \\ P(k \mid H_1) & = \int_0^1 \theta^k (1 - \theta)^ \,d\theta = \operatorname(k + 1, n - k + 1) = 1 / (n + 1) \approx 1.02 \times 10^, \end where \operatorname(a, b) is the
Beta function In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral : \Beta(z_1,z_2) = \int_0^1 t^ ...
. From these values, we find the posterior probability of P(H_0 \mid k) \approx 0.95, which strongly favors H_0 over H_1. The two approaches—the Bayesian and the frequentist—appear to be in conflict, and this is the "paradox".


Reconciling the Bayesian and frequentist approaches


Almost sure hypothesis testing

Naaman proposed an adaption of the significance level to the sample size in order to control false positives: , such that with . At least in the numerical example, taking , results in a significance level of 0.00318, so the frequentist would not reject the null hypothesis, which is in agreement with the Bayesian approach.


Uninformative priors

If we use an
uninformative prior A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...
and test a hypothesis more similar to that in the frequentist approach, the paradox disappears. For example, if we calculate the posterior distribution P(\theta \mid x, n), using a uniform prior distribution on \theta (i.e. \pi(\theta \in
, 1 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
= 1), we find : P(\theta \mid k, n) = \operatorname(k + 1, n - k + 1). If we use this to check the probability that a newborn is more likely to be a boy than a girl, i.e. P(\theta > 0.5 \mid k, n), we find : \int_^1 \operatorname(49\,582, 48\,871) \approx 0.983. In other words, it is very likely that the proportion of male births is above 0.5. Neither analysis gives an estimate of the
effect size In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the ...
, directly, but both could be used to determine, for instance, if the fraction of boy births is likely to be above some particular threshold.


The lack of an actual paradox

The apparent disagreement between the two approaches is caused by a combination of factors. First, the frequentist approach above tests H_0 without reference to H_1. The Bayesian approach evaluates H_0 as an alternative to H_1 and finds the first to be in better agreement with the observations. This is because the latter hypothesis is much more diffuse, as \theta can be anywhere in
, 1 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
/math>, which results in it having a very low posterior probability. To understand why, it is helpful to consider the two hypotheses as generators of the observations: * Under H_0, we choose \theta \approx 0.500 and ask how likely it is to see boys in births. * Under H_1, we choose \theta randomly from anywhere within 0 to 1 and ask the same question. Most of the possible values for \theta under H_1 are very poorly supported by the observations. In essence, the apparent disagreement between the methods is not a disagreement at all, but rather two different statements about how the hypotheses relate to the data: * The frequentist finds that H_0 is a poor explanation for the observation. * The Bayesian finds that H_0 is a far better explanation for the observation than H_1. The ratio of the sex of newborns is improbably 50/50 male/female, according to the frequentist test. Yet 50/50 is a better approximation than most, but not ''all'', other ratios. The hypothesis \theta \approx 0.504 would have fit the observation much better than almost all other ratios, including \theta \approx 0.500. For example, this choice of hypotheses and prior probabilities implies the statement "if \theta > 0.49 and \theta < 0.51, then the prior probability of \theta being exactly 0.5 is 0.50/0.51 ≈ 98%". Given such a strong preference for \theta = 0.5, it is easy to see why the Bayesian approach favors H_0 in the face of x \approx 0.5036, even though the observed value of x lies 2.28 \sigma away from 0.5. The deviation of over 2''σ'' from H_0 is considered significant in the frequentist approach, but its significance is overruled by the prior in the Bayesian approach. Looking at it another way, we can see that the prior distribution is essentially flat with a
delta function In mathematical analysis, the Dirac delta function (or distribution), also known as the unit impulse, is a generalized function on the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire real lin ...
at \theta = 0.5. Clearly, this is dubious. In fact, picturing real numbers as being continuous, it would be more logical to assume that it would be impossible for any given number to be exactly the parameter value, i.e., we should assume P(\theta = 0.5) = 0. A more realistic distribution for \theta in the alternative hypothesis produces a less surprising result for the posterior of H_0. For example, if we replace H_1 with H_2: \theta = x, i.e., the
maximum likelihood estimate In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
for \theta, the posterior probability of H_0 would be only 0.07 compared to 0.93 for H_2 (of course, one cannot actually use the MLE as part of a prior distribution).


See also

*
Bayes factor The Bayes factor is a ratio of two competing statistical models represented by their evidence, and is used to quantify the support for one model over the other. The models in question can have a common set of parameters, such as a null hypothesis ...


Notes


Further reading

* {{cite journal , last=Shafer , first=Glenn , author-link=Glenn Shafer , title=Lindley's paradox , journal=
Journal of the American Statistical Association The ''Journal of the American Statistical Association'' is a quarterly peer-reviewed scientific journal published by Taylor & Francis on behalf of the American Statistical Association. It covers work primarily focused on the application of statis ...
, volume=77 , issue=378, pages=325–334, year=1982 , doi=10.2307/2287244 , mr=664677 , jstor=2287244 Statistical hypothesis testing Statistical paradoxes Bayesian statistics