History
The test is named for Frank Wilcoxon (1892–1965) who, in a single paper, proposed both it and the rank-sum test for two independent samples. The test was popularized by Sidney Siegel (1956) in his influential textbook on non-parametric statistics. Siegel used the symbol ''T'' for the test statistic, and consequently, the test is sometimes referred to as the Wilcoxon ''T''-test.Test procedure
There are two variants of the signed-rank test. From a theoretical point of view, the one-sample test is more fundamental because the paired sample test is performed by converting the data to the situation of the one-sample test. However, most practical applications of the signed-rank test arise from paired data. For a paired sample test, the data consists of samples . Each sample is a pair of measurements. In the simplest case, the measurements are on anNull and alternative hypotheses
One-sample test
The one-sample Wilcoxon signed-rank test can be used to test whether data comes from a symmetric population with a specified median. If the population median is known, then it can be used to test whether data is symmetric about its center., pp. 32, 50 To explain the null and alternative hypotheses formally, assume that the data consists ofPaired data test
Because the paired data test arises from taking paired differences, its null and alternative hypotheses can be derived from those of the one-sample test. In each case, they become assertions about the behavior of the differences . Let be the joint cumulative distribution of the pairs . If is continuous, then the most general null and alternative hypotheses are expressed in terms of and are identical to the one-sample case: ; Null hypothesis ''H''0 : ; One-sided alternative hypothesis ''H''1 : . ; One-sided alternative hypothesis ''H''2 : . ; Two-sided alternative hypothesis ''H''3 : . Like the one-sample case, under some restrictions the test can be interpreted as a test for whether the pseudomedian of the differences is located at zero. A common restriction is to symmetric distributions of differences. In this case, the null and alternative hypotheses are:, pp. 39–41 ; Null hypothesis ''H''0 : The observations are symmetric about . ; One-sided alternative hypothesis ''H''1 : The observations are symmetric about . ; One-sided alternative hypothesis ''H''2 : The observations are symmetric about . ; Two-sided alternative hypothesis ''H''3 : The observations are symmetric about . These can also be expressed more directly in terms of the original pairs: ; Null hypothesis ''H''0 : The observations are ''exchangeable'', meaning that and have the same distribution. Equivalently, . ; One-sided alternative hypothesis ''H''1 : For some , the pairs and have the same distribution. ; One-sided alternative hypothesis ''H''2 : For some , the pairs and have the same distribution. ; Two-sided alternative hypothesis ''H''3 : For some , the pairs and have the same distribution. The null hypothesis of exchangeability can arise from a matched pair experiment with a treatment group and a control group. Randomizing the treatment and control within each pair makes the observations exchangeable. For an exchangeable distribution, has the same distribution as , and therefore, under the null hypothesis, the distribution is symmetric about zero. Because the one-sample test can be used as a one-sided test for stochastic dominance, the paired difference Wilcoxon test can be used to compare the following hypotheses: ; Null hypothesis ''H''0 : The observations are exchangeable. ; One-sided alternative hypothesis ''H''1 : The differences are stochastically smaller than a distribution symmetric about zero, that is, for every , . ; One-sided alternative hypothesis ''H''2 : The differences are stochastically larger than a distribution symmetric about zero, that is, for every , .Zeros and ties
In real data, it sometimes happens that there is a sample which equals zero or a pair with . It can also happen that there are tied samples. This means that for some , we have (in the one-sample case) or (in the paired sample case). This is particularly common for discrete data. When this happens, the test procedure defined above is usually undefined because there is no way to uniquely rank the data. (The sole exception is if there is a single sample which is zero and no other zeros or ties.) Because of this, the test statistic needs to be modified.Zeros
Wilcoxon's original paper did not address the question of observations (or, in the paired sample case, differences) that equal zero. However, in later surveys, he recommended removing zeros from the sample. Then the standard signed-rank test could be applied to the resulting data, as long as there were no ties. This is now called the ''reduced sample procedure.'' Pratt observed that the reduced sample procedure can lead to paradoxical behavior. He gives the following example. Suppose that we are in the one-sample situation and have the following thirteen observations: :0, 2, 3, 4, 6, 7, 8, 9, 11, 14, 15, 17, −18. The reduced sample procedure removes the zero. To the remaining data, it assigns the signed ranks: :1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, −12. This has a one-sided ''p''-value of , and therefore the sample is not significantly positive at any significance level . Pratt argues that one would expect that decreasing the observations should certainly not make the data appear more positive. However, if the zero observation is decreased by an amount less than 2, or if all observations are decreased by an amount less than 1, then the signed ranks become: :−1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, −13. This has a one-sided ''p''-value of . Therefore the sample would be judged significantly positive at any significance level . The paradox is that, if is between and , then ''decreasing'' an insignificant sample causes it to appear significantly ''positive''. Pratt therefore proposed the ''signed-rank zero procedure.'' This procedure includes the zeros when ranking the samples. However, it excludes them from the test statistic, or equivalently it defines . Pratt proved that the signed-rank zero procedure has several desirable behaviors not shared by the reduced sample procedure: # Increasing the observed values does not make a significantly positive sample insignificant, and it does not make an insignificant sample significantly negative. # If the distribution of the observations is symmetric, then the values of which the test does not reject form an interval. # A sample is significantly positive, not significant, or significantly negative, if and only if it is so when the zeros are assigned arbitrary non-zero signs, if and only if it is so when the zeros are replaced with non-zero values which are smaller in absolute value than any non-zero observation. # For a fixed significance threshold , and for a test which is randomized to have level exactly , the probability of calling a set of observations significantly positive (respectively, significantly negative) is a non-decreasing (respectively, non-increasing) function of the observations. Pratt remarks that, when the signed-rank zero procedure is combined with the average rank procedure for resolving ties, the resulting test is a consistent test against the alternative hypothesis that, for all , and differ by at least a fixed constant that is independent of and . The signed-rank zero procedure has the disadvantage that, when zeros occur, the null distribution of the test statistic changes, so tables of ''p''-values can no longer be used. When the data is on aTies
When the data does not have ties, the ranks are used to calculate the test statistic. In the presence of ties, the ranks are not defined. There are two main approaches to resolving this. The most common procedure for handling ties, and the one originally recommended by Wilcoxon, is called the ''average rank'' or ''midrank procedure.'' This procedure assigns numbers between 1 and ''n'' to the observations, with two observations getting the same number if and only if they have the same absolute value. These numbers are conventionally called ranks even though the set of these numbers is not equal to (except when there are no ties). The rank assigned to an observation is the average of the possible ranks it would have if the ties were broken in all possible ways. Once the ranks are assigned, the test statistic is computed in the same way as usual. For example, suppose that the observations satisfy In this case, is assigned rank 1, and are assigned rank , is assigned rank 4, and , , and are assigned rank . Formally, suppose that there is a set of observations all having the same absolute value , that observations have absolute value less than , and that observations have absolute value less than or equal to . If the ties among the observations with absolute value were broken, then these observations would occupy ranks through . The average rank procedure therefore assigns them the rank . Under the average rank procedure, the null distribution is different in the presence of ties. The average rank procedure also has some disadvantages that are similar to those of the reduced sample procedure for zeros. It is possible that a sample can be judged significantly positive by the average rank procedure; but increasing some of the values so as to break the ties, or breaking the ties in any way whatsoever, results in a sample that the test judges to be not significant. However, increasing all the observed values by the same amount cannot turn a significantly positive result into an insignificant one, nor an insignificant one into a significantly negative one. Furthermore, if the observations are distributed symmetrically, then the values of which the test does not reject form an interval. The other common option for handling ties is a tiebreaking procedure. In a tiebreaking procedure, the observations are assigned distinct ranks in the set . The rank assigned to an observation depends on its absolute value and the tiebreaking rule. Observations with smaller absolute values are always given smaller ranks, just as in the standard rank-sum test. The tiebreaking rule is used to assign ranks to observations with the same absolute value. One advantage of tiebreaking rules is that they allow the use of standard tables for computing ''p''-values. ''Random tiebreaking'' breaks the ties at random. Under random tiebreaking, the null distribution is the same as when there are no ties, but the result of the test depends not only on the data but on additional random choices. Averaging the ranks over the possible random choices results in the average rank procedure. One could also report the probability of rejection over all random choices. Random tiebreaking has the advantage that the probability that a sample is judged significantly positive does not decrease when some observations are increased. ''Conservative tiebreaking'' breaks the ties in favor of the null hypothesis. When performing a one-sided test in which negative values of tend to be more significant, ties are broken by assigning lower ranks to negative observations and higher ranks to positive ones. When the test makes positive values of significant, ties are broken the other way, and when large absolute values of are significant, ties are broken so as to make as small as possible. Pratt observes that when ties are likely, the conservative tiebreaking procedure "presumably has low power, since it amounts to breaking all ties in favor of the null hypothesis." The average rank procedure can disagree with tiebreaking procedures. Pratt gives the following example. Suppose that the observations are: :1, 1, 1, 1, 2, 3, −4. The average rank procedure assigns these the signed ranks :2.5, 2.5, 2.5, 2.5, 5, 6, −7. This sample is significantly positive at the one-sided level . On the other hand, any tiebreaking rule will assign the ranks :1, 2, 3, 4, 5, 6, −7. At the same one-sided level , this is not significant. Two other options for handling ties are based around averaging the results of tiebreaking. In the ''average statistic'' method, the test statistic is computed for every possible way of breaking ties, and the final statistic is the mean of the tie-broken statistics. In the ''average probability'' method, the ''p''-value is computed for every possible way of breaking ties, and the final ''p''-value is the mean of the tie-broken ''p''-values.Computing the null distribution
Computing ''p''-values requires knowing the distribution of under the null hypothesis. There is no closed formula for this distribution. However, for small values of , the distribution may be computed exactly. Under the null hypothesis that the data is symmetric about zero, each is exactly as likely to be positive as it is negative. Therefore the probability that under the null hypothesis is equal to the number of sign combinations that yield divided by the number of possible sign combinations . This can be used to compute the exact distribution of under the null hypothesis. Computing the distribution of by considering all possibilities requires computing sums, which is intractable for all but the smallest . However, there is an efficient recursion for the distribution of . Define to be the number of sign combinations for which . This is equal to the number of subsets of which sum to . The base cases of the recursion are , for all , and for all or . The recursive formula is The formula is true because every subset of which sums to either does not contain , in which case it is also a subset of , or it does contain , in which case removing from the subset produces a subset of which sums to . Under the null hypothesis, the probability mass function of satisfies . The function is closely related to the integer partition function. If is the probability that under the null hypothesis when there are samples, then satisfies a similar recursion: with similar boundary conditions. There is also a recursive formula for the cumulative distribution function . For very large , even the above recursion is too slow. In this case, the null distribution can be approximated. The null distributions of , , and are asymptotically normal with means and variances: Better approximations can be produced using Edgeworth expansions. Using a fourth-order Edgeworth expansion shows that: where The technical underpinnings of these expansions are rather involved, because conventional Edgeworth expansions apply to sums of IID continuous random variables, while is a sum of non-identically distributed discrete random variables. The final result, however, is that the above expansion has an error of , just like a conventional fourth-order Edgeworth expansion. The moment generating function of has the exact formula: When zeros are present and the signed-rank zero procedure is used, or when ties are present and the average rank procedure is used, the null distribution of changes. Cureton derived a normal approximation for this situation. Suppose that the original number of observations was and the number of zeros was . The tie correction is where the sum is over all the sizes of each group of tied observations. The expectation of is still zero, while the expectation of is If thenAlternative statistics
Wilcoxon originally defined the Wilcoxon rank-sum statistic to be . Early authors such as Siegel followed Wilcoxon. This is appropriate for two-sided hypothesis tests, but it cannot be used for one-sided tests. Instead of assigning ranks between 1 and ''n'', it is also possible to assign ranks between 0 and . These are called ''modified ranks''. The modified signed-rank sum , the modified positive-rank sum , and the modified negative-rank sum are defined analogously to , , and but with the modified ranks in place of the ordinary ranks. The probability that the sum of two independent -distributed random variables is positive can be estimated as . When consideration is restricted to continuous distributions, this is a minimum variance unbiased estimator of .Example
is theEffect size
To compute anSoftware implementations
* R includes an implementation of the test aswilcox.test(x,y, paired=TRUE)
, where x and y are vectors of equal length.
wilcoxon_test
function.
See also
* Mann–Whitney–Wilcoxon test * Sign testReferences
External links