HOME

TheInfoList



OR:

In
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
and statistics, studentized range distribution is the continuous
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
of the
studentized range In statistics, the studentized range, denoted ''q'', is the difference between the largest and smallest data in a sample normalized by the sample standard deviation. It is named after William Sealy Gosset (who wrote under the pseudonym "''Studen ...
of an i.i.d. sample from a
normally distributed In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu is ...
population. Suppose that we take a sample of size ''n'' from each of ''k'' populations with the same
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu i ...
''N''(''μ'', ''σ''2) and suppose that \bar_ is the smallest of these sample means and \bar_ is the largest of these sample means, and suppose ''s''² is the pooled sample variance from these samples. Then the following statistic has a Studentized range distribution. :q = \frac


Definition


Probability density function

Differentiating the cumulative distribution function with respect to ''q'' gives the
probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) c ...
. ::f_\text(q;k,\nu) = \frac\int_0^\infty s^\nu \, \varphi(\sqrt \,s)\,\left int_^\infty \varphi(z+q\,s)\,\varphi(z)\, \left[\Phi(z+q\,s)-\Phi(z)\right \, \mathrmz\right] \, \mathrms Note that in the outer part of the integral, the equation ::\varphi(\sqrt\,s) \, \sqrt = e^ was used to replace an exponential factor.


Cumulative distribution function

The cumulative distribution function is given by ::F_\text(q;k,\nu) = \frac \int_0^\infty s^ \varphi(\sqrt\,s) \left int_^\infty \varphi(z) \left[\Phi(z+q\,s)-\Phi(z)\right \, \mathrmz \right">Phi(z+q\,s)-\Phi(z)\right.html" ;"title="int_^\infty \varphi(z) \left[\Phi(z+q\,s)-\Phi(z)\right">int_^\infty \varphi(z) \left[\Phi(z+q\,s)-\Phi(z)\right \, \mathrmz \right\, \mathrms


Special cases

If ''k'' is 2 or 3, the studentized range probability distribution function can be directly evaluated, where \varphi(z) is the standard normal probability density function and \Phi(z) is the standard normal cumulative distribution function. ::f_R(q;k=2) = \sqrt\,\varphi\left(\,q/\sqrt\right) ::f_R(q;k=3) = 6 \sqrt\, \varphi\left(\,q/\sqrt\right)\left[\Phi\left( q / \sqrt \right)-\tfrac \right] When the degrees of freedom approaches infinity the studentized range cumulative distribution can be calculated for any ''k'' using the standard normal distribution. ::F_R(q;k) = k\, \int_^\infty \varphi(z)\,\Bigl Phi(z+q)-\Phi(z)\Bigr \, \mathrmz = k\, \int_^\infty \,\Bigl Phi(z+q)-\Phi(z)\Bigr \, \mathrm\Phi(z)


Applications

Critical values of the studentized range distribution are used in
Tukey's range test Tukey's range test, also known as Tukey's test, Tukey method, Tukey's honest significance test, or Tukey's HSD (honestly significant difference) test, Also occasionally as "honestly," see e.g. is a single-step multiple comparison procedure and ...
. The studentized range is used to calculate significance levels for results obtained by data mining, where one selectively seeks extreme differences in sample data, rather than only sampling randomly. The Studentized range distribution has applications to
hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...
and
multiple comparisons In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or infers a subset of parameters selected based on the observed values. The more inferenc ...
procedures. For example,
Tukey's range test Tukey's range test, also known as Tukey's test, Tukey method, Tukey's honest significance test, or Tukey's HSD (honestly significant difference) test, Also occasionally as "honestly," see e.g. is a single-step multiple comparison procedure and ...
and
Duncan's new multiple range test In statistics, Duncan's new multiple range test (MRT) is a multiple comparison procedure developed by David B. Duncan in 1955. Duncan's MRT belongs to the general class of multiple comparison procedures that use the studentized range statistic ''q ...
(MRT), in which the sample ''x''1, ..., ''x''''n'' is a sample of
means Means may refer to: * Means LLC, an anti-capitalist media worker cooperative * Means (band), a Christian hardcore band from Regina, Saskatchewan * Means, Kentucky, a town in the US * Means (surname) * Means Johnston Jr. (1916–1989), US Navy ...
and ''q'' is the basic test-statistic, can be used as
post-hoc analysis In a scientific study, post hoc analysis (from Latin '' post hoc'', "after this") consists of statistical analyses that were specified after the data were seen. They are usually used to uncover specific differences between three or more group mean ...
to test between which two groups means there is a significant difference (pairwise comparisons) after rejecting the
null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
that all groups are from the same population (i.e. all means are equal) by the standard
analysis of variance Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statistician ...
.Pearson & Hartley (1970, Section 14.2)


Related distributions

When only the equality of the two groups means is in question (i.e. whether ''μ''1 = ''μ''2), the studentized range distribution is similar to the
Student's t distribution In probability and statistics, Student's ''t''-distribution (or simply the ''t''-distribution) is any member of a family of continuous probability distributions that arise when estimating the mean of a normally distributed population in situa ...
, differing only in that the first takes into account the number of means under consideration, and the critical value is adjusted accordingly. The more means under consideration, the larger the critical value is. This makes sense since the more means there are, the greater the probability that at least some differences between pairs of means will be significantly large due to chance alone.


Derivation

The studentized range distribution function arises from re-scaling the sample range ''R'' by the
sample standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
''s'', since the studentized range is customarily tabulated in units of standard deviations, with the variable . The derivation begins with a perfectly general form of the distribution function of the sample range, which applies to any sample data distribution. In order to obtain the distribution in terms of the "studentized" range ''q'', we will change variable from ''R'' to ''s'' and ''q''. Assuming the sample data is
normally distributed In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu is ...
, the standard deviation ''s'' will be distributed. By further integrating over ''s'' we can remove ''s'' as a parameter and obtain the re-scaled distribution in terms of ''q'' alone.


General form

For any probability density function ''f'', the range probability density ''f'' is: ::f_R(r;k) = k\,(k-1)\int_^\infty f_X\left(t+\tfrac r\right)f_X \left(t - \tfrac r\right) \left int_^ f_X(x) \, \mathrmx\right \, \mathrm\,t What this means is that we are adding up the probabilities that, given ''k'' draws from a distribution, two of them differ by ''r'', and the remaining ''k'' − 2 draws all fall between the two extreme values. If we change variables to ''u'' where u=t-\tfrac r is the low-end of the range, and define ''F'' as the cumulative distribution function of ''f'', then the equation can be simplified: ::f_R(r;k) = k\,(k-1)\int_^\infty f_X(u+r)\, f_X(u)\, \left , F_X(u+r)-F_X(u)\, \right \, \mathrm\,u We introduce a similar integral, and notice that differentiating under the integral-sign gives : \begin \frac & \left k\,\int_^\infty f_X(u)\, \Bigl[\, F_X(u+r)-F_X(u)\, \Bigr \, \mathrm\,u \right">,_F_X(u+r)-F_X(u)\,_\Bigr.html" ;"title="k\,\int_^\infty f_X(u)\, \Bigl[\, F_X(u+r)-F_X(u)\, \Bigr">k\,\int_^\infty f_X(u)\, \Bigl[\, F_X(u+r)-F_X(u)\, \Bigr \, \mathrm\,u \right\\[5pt] = & k\,(k-1)\int_^\infty f_X(u+r)\, f_X(u)\, \Bigl , F_X(u+r)-F_X(u)\, \Bigr \, \mathrm\,u \end which recovers the integral above, so that last relation confirms :: \begin F_R(r;k) & = k \int_^\infty f_X(u) \Bigl , F_X(u+r)-F_X(u)\, \Bigr \, \mathrm\,u \\ & = k \int_^\infty \Bigl , F_X(u+r)-F_X(u)\, \Bigr \, \mathrm\,F_X(u) \end because for any continuous cdf ::\frac = f_R(r;k)


Special form for normal data

The range distribution is most often used for confidence intervals around sample averages, which are asymptotically
normally distributed In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu is ...
by the
central limit theorem In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables thems ...
. In order to create the studentized range distribution for normal data, we first switch from the generic ''f'' and ''F'' to the distribution functions ''φ'' and Φ for the
standard normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu i ...
, and change the variable ''r'' to ''s·q'', where ''q'' is a fixed factor that re-scales ''r'' by scaling factor ''s'': : f_R(q;k) = s\,k\,(k-1)\int_^\infty \varphi(u+sq) \varphi(u)\, \left , \Phi(u+sq) - \Phi(u) \right \, \mathrmu Choose the scaling factor ''s'' to be the sample standard deviation, so that ''q'' becomes the number of standard deviations wide that the range is. For normal data ''s'' is chi distributed and the distribution function ''f'' of the chi distribution is given by: : f_S(s;\nu)\,\mathrms = \begin \dfrac \, \mathrms & \text\, 0 < s < \infty, \\ pt0 & \text. \end Multiplying the distributions ''f'' and ''f'' and integrating to remove the dependence on the standard deviation ''s'' gives the studentized range distribution function for normal data: ::f_R(q;k,\nu) = \frac \int_0^\infty s^\nu e^ \int_^\infty \varphi(u+sq)\, \varphi(u)\, \left , \Phi(u+sq) - \Phi(u) \right \, \mathrmu \,\mathrms where :''q'' is the width of the data range measured in standard deviations, :' is the number of degrees of freedom for determining the sample standard deviation, and :''k'' is the number of separate averages that form the points within the range. The equation for the
pdf Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
shown in the sections above comes from using ::e^ = \sqrt\,\varphi(\sqrt\,s) to replace the exponential expression in the outer integral.


Notes


References


Further reading

* * *{{Cite journal , last1=Dunlap , first1=W.P. , last2=Powell , first2=R.S. , last3=Konnerth , first3=T.K. , title=A FORTRAN IV function for calculating probabilities associated with the studentized range statistic , journal=Behavior Research Methods & Instrumentation , volume=9 , issue=4 , pages=373–375 , year=1977 , doi=10.3758/BF03202264, doi-access=free


External links


Table of critical values for the Studentized range distribution
Continuous distributions