Studentized Range Distribution
   HOME

TheInfoList



OR:

In
probability Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...
and
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, studentized range distribution is the continuous
probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
of the
studentized range In statistics, the studentized range, denoted ''q'', is the difference between the largest and smallest data in a sample normalized by the sample standard deviation. It is named after William Sealy Gosset (who wrote under the pseudonym "''Student ...
of an i.i.d. sample from a
normally distributed In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x ...
population. Suppose that we take a sample of size ''n'' from each of ''k'' populations with the same
normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...
''N''(''μ'', ''σ''2) and suppose that \bar_ is the smallest of these sample means and \bar_ is the largest of these sample means, and suppose ''s''² is the pooled sample variance from these samples. Then the following statistic has a Studentized range distribution. :q = \frac


Definition


Probability density function

Differentiating the cumulative distribution function with respect to ''q'' gives the
probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
. ::f_\text(q;k,\nu) = \frac\int_0^\infty s^\nu \, \varphi(\sqrt \,s)\,\left int_^\infty \varphi(z+q\,s)\,\varphi(z)\, \left[\Phi(z+q\,s)-\Phi(z)\right \, \mathrmz\right] \, \mathrms Note that in the outer part of the integral, the equation ::\varphi(\sqrt\,s) \, \sqrt = e^ was used to replace an exponential factor.


Cumulative distribution function

The cumulative distribution function is given by ::F_\text(q;k,\nu) = \frac \int_0^\infty s^ \varphi(\sqrt\,s) \left int_^\infty \varphi(z) \left[\Phi(z+q\,s)-\Phi(z)\right \, \mathrmz \right">Phi(z+q\,s)-\Phi(z)\right.html" ;"title="int_^\infty \varphi(z) \left[\Phi(z+q\,s)-\Phi(z)\right">int_^\infty \varphi(z) \left[\Phi(z+q\,s)-\Phi(z)\right \, \mathrmz \right\, \mathrms


Special cases

If ''k'' is 2 or 3, the studentized range probability distribution function can be directly evaluated, where \varphi(z) is the standard normal probability density function and \Phi(z) is the standard normal cumulative distribution function. ::f_R(q;k=2) = \sqrt\,\varphi\left(\,q/\sqrt\right) ::f_R(q;k=3) = 6 \sqrt\, \varphi\left(\,q/\sqrt\right)\left[\Phi\left( q / \sqrt \right)-\tfrac \right] When the degrees of freedom approaches infinity the studentized range cumulative distribution can be calculated for any ''k'' using the standard normal distribution. ::F_R(q;k) = k\, \int_^\infty \varphi(z)\,\Bigl Phi(z+q)-\Phi(z)\Bigr \, \mathrmz = k\, \int_^\infty \,\Bigl Phi(z+q)-\Phi(z)\Bigr \, \mathrm\Phi(z)


Applications

Critical values of the studentized range distribution are used in
Tukey's range test Tukey's range test, also known as Tukey's test, Tukey method, Tukey's honest significance test, or Tukey's HSD (honestly significant difference) test, : Also occasionally described as "honestly", see e.g. is a single-step multiple comparison p ...
. The studentized range is used to calculate significance levels for results obtained by
data mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
, where one selectively seeks extreme differences in sample data, rather than only sampling randomly. The Studentized range distribution has applications to
hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. T ...
and
multiple comparisons Multiple comparisons, multiplicity or multiple testing problem occurs in statistics when one considers a set of statistical inferences simultaneously or estimates a subset of parameters selected based on the observed values. The larger the numbe ...
procedures. For example,
Tukey's range test Tukey's range test, also known as Tukey's test, Tukey method, Tukey's honest significance test, or Tukey's HSD (honestly significant difference) test, : Also occasionally described as "honestly", see e.g. is a single-step multiple comparison p ...
and
Duncan's new multiple range test In statistics, Duncan's new multiple range test (MRT) is a multiple comparison procedure developed by David B. Duncan in 1955. Duncan's MRT belongs to the general class of multiple comparison procedures that use the studentized range statistic ''q ...
(MRT), in which the sample ''x''1, ..., ''x''''n'' is a sample of
means Means may refer to: * Means LLC, an anti-capitalist media worker cooperative * Means (band), a Christian hardcore band from Regina, Saskatchewan * Means, Kentucky, a town in the US * Means (surname) * Means Johnston Jr. (1916–1989), US Navy ...
and ''q'' is the basic test-statistic, can be used as
post-hoc analysis ''Post hoc'' (sometimes written as ''post-hoc'') is a Latin phrase, meaning "after this" or "after the event". ''Post hoc'' may refer to: *Post hoc analysis, ''Post hoc'' analysis or ''post hoc'' test, statistical analyses that were not specified ...
to test between which two groups means there is a significant difference (pairwise comparisons) after rejecting the
null hypothesis The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...
that all groups are from the same population (i.e. all means are equal) by the standard
analysis of variance Analysis of variance (ANOVA) is a family of statistical methods used to compare the Mean, means of two or more groups by analyzing variance. Specifically, ANOVA compares the amount of variation ''between'' the group means to the amount of variati ...
.Pearson & Hartley (1970, Section 14.2)


Related distributions

When only the equality of the two groups means is in question (i.e. whether ''μ''1 = ''μ''2), the studentized range distribution is similar to the
Student's t distribution In probability theory and statistics, Student's  distribution (or simply the  distribution) t_\nu is a continuous probability distribution that generalizes the standard normal distribution. Like the latter, it is symmetric around zero ...
, differing only in that the first takes into account the number of means under consideration, and the critical value is adjusted accordingly. The more means under consideration, the larger the critical value is. This makes sense since the more means there are, the greater the probability that at least some differences between pairs of means will be significantly large due to chance alone.


Derivation

The studentized range distribution function arises from re-scaling the sample range ''R'' by the
sample standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its mean. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the ...
''s'', since the studentized range is customarily tabulated in units of standard deviations, with the variable . The derivation begins with a perfectly general form of the distribution function of the sample range, which applies to any sample data distribution. In order to obtain the distribution in terms of the "studentized" range ''q'', we will change variable from ''R'' to ''s'' and ''q''. Assuming the sample data is
normally distributed In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x ...
, the standard deviation ''s'' will be distributed. By further integrating over ''s'' we can remove ''s'' as a parameter and obtain the re-scaled distribution in terms of ''q'' alone.


General form

For any probability density function ''f'', the range probability density ''f'' is: ::f_R(r;k) = k\,(k-1)\int_^\infty f_X\left(t+\tfrac r\right)f_X \left(t - \tfrac r\right) \left int_^ f_X(x) \, \mathrmx\right \, \mathrm\,t What this means is that we are adding up the probabilities that, given ''k'' draws from a distribution, two of them differ by ''r'', and the remaining ''k'' − 2 draws all fall between the two extreme values. If we change variables to ''u'' where u=t-\tfrac r is the low-end of the range, and define ''F'' as the cumulative distribution function of ''f'', then the equation can be simplified: ::f_R(r;k) = k\,(k-1)\int_^\infty f_X(u+r)\, f_X(u)\, \left , F_X(u+r)-F_X(u)\, \right \, \mathrm\,u We introduce a similar integral, and notice that differentiating under the integral-sign gives : \begin \frac & \left k\,\int_^\infty f_X(u)\, \Bigl[\, F_X(u+r)-F_X(u)\, \Bigr \, \mathrm\,u \right">,_F_X(u+r)-F_X(u)\,_\Bigr.html" ;"title="k\,\int_^\infty f_X(u)\, \Bigl[\, F_X(u+r)-F_X(u)\, \Bigr">k\,\int_^\infty f_X(u)\, \Bigl[\, F_X(u+r)-F_X(u)\, \Bigr \, \mathrm\,u \right\\[5pt] = & k\,(k-1)\int_^\infty f_X(u+r)\, f_X(u)\, \Bigl , F_X(u+r)-F_X(u)\, \Bigr \, \mathrm\,u \end which recovers the integral above, so that last relation confirms :: \begin F_R(r;k) & = k \int_^\infty f_X(u) \Bigl , F_X(u+r)-F_X(u)\, \Bigr \, \mathrm\,u \\ & = k \int_^\infty \Bigl , F_X(u+r)-F_X(u)\, \Bigr \, \mathrm\,F_X(u) \end because for any continuous cdf ::\frac = f_R(r;k)


Special form for normal data

The range distribution is most often used for confidence intervals around sample averages, which are asymptotically
normally distributed In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x ...
by the
central limit theorem In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the Probability distribution, distribution of a normalized version of the sample mean converges to a Normal distribution#Standard normal distributi ...
. In order to create the studentized range distribution for normal data, we first switch from the generic ''f'' and ''F'' to the distribution functions ''φ'' and Φ for the
standard normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac e^ ...
, and change the variable ''r'' to ''s·q'', where ''q'' is a fixed factor that re-scales ''r'' by scaling factor ''s'': : f_R(q;k) = s\,k\,(k-1)\int_^\infty \varphi(u+sq) \varphi(u)\, \left , \Phi(u+sq) - \Phi(u) \right \, \mathrmu Choose the scaling factor ''s'' to be the sample standard deviation, so that ''q'' becomes the number of standard deviations wide that the range is. For normal data ''s'' is chi distributed and the distribution function ''f'' of the chi distribution is given by: : f_S(s;\nu)\,\mathrms = \begin \dfrac \, \mathrms & \text\, 0 < s < \infty, \\ pt0 & \text. \end Multiplying the distributions ''f'' and ''f'' and integrating to remove the dependence on the standard deviation ''s'' gives the studentized range distribution function for normal data: ::f_R(q;k,\nu) = \frac \int_0^\infty s^\nu e^ \int_^\infty \varphi(u+sq)\, \varphi(u)\, \left , \Phi(u+sq) - \Phi(u) \right \, \mathrmu \,\mathrms where :''q'' is the width of the data range measured in standard deviations, :' is the number of degrees of freedom for determining the sample standard deviation, and :''k'' is the number of separate averages that form the points within the range. The equation for the
pdf Portable document format (PDF), standardized as ISO 32000, is a file format developed by Adobe Inc., Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, computer hardware, ...
shown in the sections above comes from using ::e^ = \sqrt\,\varphi(\sqrt\,s) to replace the exponential expression in the outer integral.


Notes


References


Further reading

* * *{{Cite journal , last1=Dunlap , first1=W.P. , last2=Powell , first2=R.S. , last3=Konnerth , first3=T.K. , title=A FORTRAN IV function for calculating probabilities associated with the studentized range statistic , journal=Behavior Research Methods & Instrumentation , volume=9 , issue=4 , pages=373–375 , year=1977 , doi=10.3758/BF03202264, doi-access=free


External links


Table of critical values for the Studentized range distribution
Continuous distributions