Sample Range
   HOME

TheInfoList



OR:

In
descriptive statistics A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and an ...
, the range of a set of data is size of the narrowest interval which contains all the data. It is calculated as the difference between the largest and smallest values (also known as the
sample maximum and minimum In statistics, the sample maximum and sample minimum, also called the largest observation and smallest observation, are the values of the greatest and least elements of a sample (statistics), sample. They are basic summary statistics, used in de ...
). It is expressed in the same
units Unit may refer to: General measurement * Unit of measurement, a definite magnitude of a physical quantity, defined and adopted by convention or by law **International System of Units (SI), modern form of the metric system **English units, histo ...
as the data. The range provides an indication of
statistical dispersion In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartil ...
. Closely related alternative measures are the
Interdecile range In statistics, the interdecile range is the difference between the first and the ninth deciles (10% and 90%). The interdecile range is a measure of statistical dispersion of the values in a set of data, similar to the range and the interquartile ...
and the
Interquartile range In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the differen ...
.


Range of continuous IID random variables

For ''n'' independent and identically distributed continuous random variables ''X''1, ''X''2, ..., ''X''''n'' with the
cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...
G(''x'') and a
probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
g(''x''), let T denote the range of them, that is, T= max(''X''1, ''X''2, ..., ''X''''n'')- min(''X''1, ''X''2, ..., ''X''''n'').


Distribution

The range, T, has the cumulative distribution function ::F(t)= n \int_^\infty g(x) (x+t)-G(x) \, \textx. Gumbel notes that the "beauty of this formula is completely marred by the facts that, in general, we cannot express ''G''(''x'' + ''t'') by ''G''(''x''), and that the numerical integration is lengthy and tiresome." If the distribution of each ''X''''i'' is limited to the right (or left) then the asymptotic distribution of the range is equal to the asymptotic distribution of the largest (smallest) value. For more general distributions the asymptotic distribution can be expressed as a
Bessel function Bessel functions, named after Friedrich Bessel who was the first to systematically study them in 1824, are canonical solutions of Bessel's differential equation x^2 \frac + x \frac + \left(x^2 - \alpha^2 \right)y = 0 for an arbitrary complex ...
.


Moments

The mean range is given by ::n \int_0^1 x(G) ^-(1-G)^\,\textG where ''x''(''G'') is the inverse function. In the case where each of the ''X''''i'' has a
standard normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac e^ ...
, the mean range is given by ::\int_^\infty (1-(1-\Phi(x))^n-\Phi(x)^n ) \,\textx.


Derivation of the distribution

Please note that the following is an informal derivation of the result. It is a bit loose with the calculation of the probabilities. Let m, M denote respectively the min and max of the random variables X_1 \dots X_n. The event that the range is smaller than T can be decomposed into smaller events according to: * the index of the minimum value * and the value x of the minimum. For a given index i and minimum value x, the probability of the joint event: # X_i is the minimum, # and X_i=x, # and the range is smaller than T, is: g(x) \left G(x+T) - G(x) \right Summing over the indices and integrating over x yields the total probability of the event: "the range is smaller than T" which is exactly the cumulative density function of the range: F(t) = n \int_^ g(x) \left (t+x)-G(x) \right \, \textx which concludes the proof.


The range in other models

Outside of the IID case with continuous random variables, other cases have explicit formulas. These cases are of marginal interest. * non-IID continuous random variables. * Discrete variables supported on \mathbb N. A key difficulty for discrete variables is that the range is discrete. This makes the derivation of the formula require
combinatorics Combinatorics is an area of mathematics primarily concerned with counting, both as a means and as an end to obtaining results, and certain properties of finite structures. It is closely related to many other areas of mathematics and has many ...
.


Related quantities

The range is a specific example of
order statistic In statistics, the ''k''th order statistic of a statistical sample is equal to its ''k''th-smallest value. Together with Ranking (statistics), rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and ...
s. In particular, the range is a linear function of order statistics, which brings it into the scope of L-estimation.


See also

*
Interdecile range In statistics, the interdecile range is the difference between the first and the ninth deciles (10% and 90%). The interdecile range is a measure of statistical dispersion of the values in a set of data, similar to the range and the interquartile ...
*
Interquartile range In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the differen ...
*
Studentized range In statistics, the studentized range, denoted ''q'', is the difference between the largest and smallest data in a sample normalized by the sample standard deviation. It is named after William Sealy Gosset (who wrote under the pseudonym "''Student ...


References

{{DEFAULTSORT:Range (Statistics) Statistical deviation and dispersion Scale statistics Summary statistics