In the

design of experiments The design of experiments (DOE), also known as experiment design or experimental design, is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. ...

statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, the lady tasting tea is a

randomized experiment In scientific method, science, randomized experiments are the experiments that allow the greatest reliability and validity of statistical estimates of treatment effects. Randomization-based inference is especially important in experimental design ...

devised by

Ronald Fisher Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who a ...

and reported in his book '' The Design of Experiments'' (1935). The experiment is the original exposition of Fisher's notion of a null hypothesis, which is "never proved or established, but is possibly disproved, in the course of experimentation".OED quote: 1935 R. A. Fisher, '' The Design of Experiments'' ii. 19, "We may speak of this hypothesis as the 'null hypothesis' ..the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation." The example is loosely based on an event in Fisher's life. The woman in question, phycologist Muriel Bristol, claimed to be able to tell whether the tea or the milk was added first to a cup. Her future husband, William Roach, suggested that Fisher give her eight cups, four of each variety, in random order. One could then ask what the probability was for her getting the specific number of cups she identified correct (in fact all eight), but just by chance. Fisher's description is less than 10 pages in length and is notable for its simplicity and completeness regarding terminology, calculations and design of the experiment. The test used was Fisher's exact test.

The experiment

The experiment provides a subject with eight randomly ordered cups of tea – four prepared by pouring milk and then tea, four by pouring tea and then milk. The subject attempts to select the four cups prepared by one method or the other, and may compare cups directly against each other as desired. The method employed in the experiment is fully disclosed to the subject. The null hypothesis is that the subject has no ability to distinguish the teas. In Fisher's approach, there was no alternative hypothesis, unlike in the Neyman–Pearson approach. The test statistic is a simple count of the number of successful attempts to select the four cups prepared by a given method. The distribution of possible numbers of successes, assuming the null hypothesis is true, can be computed using the number of combinations. Using the combination formula, with

n=8

total cups and

k=4

cups chosen, there are

\binom = \frac = 70

possible combinations. The frequencies of the possible numbers of successes, given in the final column of this table, are derived as follows. For 0 successes, there is clearly only one set of four choices (namely, choosing all four incorrect cups) giving this result. For one success and three failures, there are four correct cups of which one is selected, which by the combination formula can occur in

\binom 41 =4

different ways (as shown in column 2, with ''x'' denoting a correct cup that is chosen and ''o'' denoting a correct cup that is not chosen); and independently of that, there are four incorrect cups of which three are selected, which can occur in

\binom 43 = 4

ways (as shown in the second column, this time with ''x'' interpreted as an incorrect cup which is not chosen, and ''o'' indicating an incorrect cup which is chosen). Thus a selection of any one correct cup and any three incorrect cups can occur in any of 4×4 = 16 ways. The frequencies of the other possible numbers of successes are calculated correspondingly. Thus the number of successes is distributed according to the hypergeometric distribution. Specifically, for a random variable

X

equal to the number of successes, we may write

X \sim \operatorname(N=8,K=4,n=4)

, where

N

is the population size or total number of cups of tea,

K

is the number of success states in the population or four cups of either type, and

n

is the number of draws, or four cups. The distribution of combinations for making ''k'' selections out of the ''2k'' available selections corresponds to the ''k''th row of Pascal's triangle, such that each integer in the row is squared. In this case,

k = 4

because 4 teacups are selected from the 8 available teacups. The critical region for rejection of the null of no ability to distinguish was the single case of 4 successes of 4 possible, based on the conventional probability criterion < 5%. This is the critical region because under the null of no ability to distinguish, 4 successes has 1 chance out of 70 (≈ 1.4% < 5%) of occurring, whereas at least 3 of 4 successes has a probability of (16+1)/70 (≈ 24.3% > 5%). Thus,

if and only if In logic and related fields such as mathematics and philosophy, "if and only if" (often shortened as "iff") is paraphrased by the biconditional, a logical connective between statements. The biconditional is true in two cases, where either bo ...

the lady properly categorized all was Fisher willing to reject the null hypothesis – effectively acknowledging the lady's ability at a 1.4% significance level (but without quantifying her ability). Fisher later discussed the benefits of more trials and repeated tests. David Salsburg reports that a colleague of Fisher, H. Fairfield Smith, revealed that in the actual experiment the lady succeeded in identifying all eight cups correctly. The chance of someone who just guesses of getting all correct, assuming she guesses that any four had the tea put in first and the other four the milk, would be only 1 in 70 (the combinations of 8 taken 4 at a time).

''The Lady Tasting Tea'' book

David Salsburg published a

popular science Popular science (also called pop-science or popsci) is an interpretation of science intended for a general audience. While science journalism focuses on recent scientific developments, popular science is more broad ranging. It may be written ...

book entitled '' The Lady Tasting Tea'',Salsburg (2002) which describes Fisher's experiment and ideas on randomization. Deb Basu wrote that "the famous case of the 'lady tasting tea was "one of the two supporting pillars ... of the randomization analysis of experimental data."Basu (1980a, p. 575; 1980b)

References

* * * Basu, D. (1980b). "The Fisher Randomization Test", reprinted with a new preface in ''Statistical Information and Likelihood : A Collection of Critical Essays by Dr. D. Basu''; J. K. Ghosh, editor. Springer 1988. * * Salsburg, D. (2002) '' The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century'', W.H. Freeman / Owl Book. {{refend Design of experiments Statistical hypothesis testing Science experiments Ronald Fisher

The experiment

''The Lady Tasting Tea'' book

See also

References