In
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, the Jonckheere trend test
(sometimes called the Jonckheere–Terpstra test) is a test for an ordered
alternative hypothesis
In statistical hypothesis testing, the alternative hypothesis is one of the proposed propositions in the hypothesis test. In general the goal of hypothesis test is to demonstrate that in the given condition, there is sufficient evidence supporting ...
within an independent samples (between-participants) design. It is similar to the
Kruskal-Wallis test in that the
null hypothesis
The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...
is that several independent samples are from the same population. However, with the Kruskal–Wallis test there is no a priori ordering of the populations from which the samples are drawn. When there is an ''a priori'' ordering, the Jonckheere test has more
statistical power
In frequentist statistics, power is the probability of detecting a given effect (if that effect actually exists) using a given test in a given context. In typical use, it is a function of the specific test that is used (including the choice of tes ...
than the Kruskal–Wallis test. The test was developed by
Aimable Robert Jonckheere, who was a psychologist and statistician at
University College London
University College London (Trade name, branded as UCL) is a Public university, public research university in London, England. It is a Member institutions of the University of London, member institution of the Federal university, federal Uni ...
.
The null and alternative hypotheses can be conveniently expressed in terms of population medians for ''k'' populations (where ''k'' > 2). Letting ''θ
i'' be the population
median
The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...
for the ''i''th population, the null hypothesis is:
:
The alternative hypothesis is that the population medians have an a priori ordering e.g.:
:
≤
≤
≤
with at least one strict inequality.
Procedure
The test can be seen as a special case of
Maurice Kendall’s more general method of
rank correlation
In statistics, a rank correlation is any of several statistics that measure an ordinal association — the relationship between rankings of different ordinal data, ordinal variables or different rankings of the same variable, where a "ranking" is t ...
and makes use of the Kendall's ''S'' statistic. This can be computed in one of two ways:
The ‘direct counting’ method
#Arrange the samples in the predicted order
#For each score in turn, count how many scores in the samples to the right are larger than the score in question. This is ''P''.
#For each score in turn, count how many scores in the samples to the right are smaller than the score in question. This is ''Q''.
#''S'' = ''P'' – ''Q''
The ‘nautical’ method
#Cast the data into an ordered
contingency table
In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format that displays the multivariate frequency distribution of the variables. They are heavily used in survey research, business int ...
, with the levels of the
independent variable
A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical function ...
increasing from left to right, and values of the
dependent variable
A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical functio ...
increasing from top to bottom.
#For each entry in the table, count all other entries that lie to the ‘South East’ of the particular entry. This is ''P''.
#For each entry in the table, count all other entries that lie to the ‘South West’ of the particular entry. This is ''Q''.
#''S'' = ''P'' – ''Q''
Note that there will always be ties in the independent variable (individuals are ‘tied’ in the sense that they are in the same group) but there may or may not be ties in the dependent variable. If there are no ties – or the ties occur within a particular sample (which does not affect the value of the test statistic) – exact tables of ''S'' are available; for example, Jonckheere
provided selected tables for values of ''k'' from 3 to 6 and equal samples sizes (''m'') from 2 to 5. Leach presented critical values of ''S'' for ''k'' = 3 with sample sizes ranging from 2,2,1 to 5,5,5.
Normal approximation to ''S''
The
standard normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac e^ ...
can be used to approximate the distribution of ''S'' under the null hypothesis for cases in which exact tables are not available. The
mean
A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
of the distribution of ''S'' will always be zero, and assuming that there are no ties scores between the values in two (or more) different samples the
variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
is given by
:
Where ''n'' is the total number of scores, and ''t
i'' is the number of scores in the ith sample. The approximation to the standard normal distribution can be improved by the use of a continuity correction: ''S
c'' = , ''S'', – 1. Thus 1 is subtracted from a positive ''S'' value and 1 is added to a negative ''S'' value. The z-score equivalent is then given by
:
Ties
If scores are tied between the values in two (or more) different samples there are no exact table for the S distribution and an approximation to the normal distribution has to be used. In this case no continuity correction is applied to the value of ''S'' and the variance is given by
:
where ''t''
''i'' is a row marginal total and ''u''
''i'' a column marginal total in the contingency table. The ''z''-score equivalent is then given by
:
A numerical example
In a partial replication of a study by Loftus and Palmer participants were assigned at random to one of three groups, and then shown a film of two cars crashing into each other. After viewing the film, the participants in one group were asked the following question: “About how fast were the cars going when they contacted each other?” Participants in a second group were asked, “About how fast were the cars going when they bumped into each other?” Participants in the third group were asked, “About how fast were the cars going when they smashed into each other?” Loftus and Palmer predicted that the action verb used (contacted, bumped, smashed) would influence the speed estimates in miles per hour (mph) such that action verbs implying greater energy would lead to higher estimated speeds. The following results were obtained (simulated data):
The ‘direct counting’ method
*The samples are already in the predicted order
*For each score in turn, count how many scores in the samples to the right are larger than the score in question to obtain ''P'':
:: ''P'' = 8 + 7 + 7 + 7 + 4 + 4 + 3 + 3 = 43
*For each score in turn, count how many scores in the samples to the right are smaller than the score in question to obtain ''Q'':
:: ''Q'' = 0 + 0 + 1 + 1 + 0 + 0 + 0 + 1 = 3
*''S'' = ''P'' - ''Q'' = 43 - 3
*''S'' = 40
The 'nautical' method
*Cast the data into an ordered contingency table
*For each entry in the table, count all other entries that lie to the 'South East' of the particular entry. This is ''P'':
:''P'' = (1 × 8) + (1 × 7) + (1 × 7) + (1 × 7) + (1 × 4) + (1 × 4) + (1 × 3) + ( 1 × 3) = 43
*For each entry in the table, count all other entries that lie to the 'South West' of the particular entry. This is ''Q'':
:''Q'' = (1 × 2) + (1 × 1) = 3
*''S'' = ''P'' − ''Q'' = 43 − 3
*''S'' = 40
Using exact tables
When the ties between samples are few (as in this example) Leach suggested that ignoring the ties and using exact tables would provide a reasonably accurate result.
Jonckheere suggested breaking the ties against the alternative hypothesis and then using exact tables.
In the current example where tied scores only appear in adjacent groups, the value of ''S'' is unchanged if the ties are broken against the alternative hypothesis. This may be verified by substituting 11 mph in place of 12 mph in the Bumped sample, and 19 mph in place of 20 mph in the Smashed and re-computing the test statistic. From tables with ''k'' = 3, and ''m'' = 4, the critical ''S'' value for ''α'' = 0.05 is 36 and thus the result would be declared
statistically significant
In statistical hypothesis testing, a result has statistical significance when a result at least as "extreme" would be very infrequent if the null hypothesis were true. More precisely, a study's defined significance level, denoted by \alpha, is the ...
at this level.
Computing a standard normal approximation
As
,
, and
, and
:
:
:
:
the variance of ''S'' is then
:
And ''z'' is given by
:
For ''α'' = 0.05 (one-sided) the critical ''z'' value is 1.645, so again the result would be declared significant at this level.
A similar test for trend within the context of repeated measures (within-participants) designs and based on Spearman's rank correlation coefficient was developed by
Page
Page most commonly refers to:
* Page (paper), one side of a leaf of paper, as in a book
Page, PAGE, pages, or paging may also refer to:
Roles
* Page (assistance occupation), a professional occupation
* Page (servant), traditionally a young m ...
.
References
Further reading
*
{{statistics, inference, collapsed
Statistical tests