statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...

, a quartile is a type of

quantile In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile th ...

which divides the number of data points into four parts, or ''quarters'', of more-or-less equal size. The data must be ordered from smallest to largest to compute quartiles; as such, quartiles are a form of order statistic. The three main quartiles are as follows: * The first quartile (''Q''₁) is defined as the middle number between the smallest number (

minimum In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given r ...

) and the median of the data set. It is also known as the ''lower'' or ''25th empirical'' quartile, as 25% of the data is below this point. * The second quartile (''Q''₂) is the median of a data set; thus 50% of the data lies below this point. * The third quartile (''Q''₃) is the middle value between the median and the highest value (

maximum In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given r ...

) of the data set. It is known as the ''upper'' or ''75th empirical'' quartile, as 75% of the data lies below this point. Along with the minimum and maximum of the data (which are also quartiles), the three quartiles described above provide a five-number summary of the data. This summary is important in statistics because it provides information about both the center and the spread of the data. Knowing the lower and upper quartile provides information on how big the spread is and if the dataset is

skewed In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal d ...

toward one side. Since quartiles divide the number of data points evenly, the range is not the same between quartiles (i.e., ''Q''₃-''Q''₂ ≠ ''Q''₂-''Q''₁) and is instead known as the

interquartile range In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the differen ...

(IQR). While the maximum and minimum also show the spread of the data, the upper and lower quartiles can provide more detailed information on the location of specific data points, the presence of outliers in the data, and the difference in spread between the middle 50% of the data and the outer data points.

Definitions

Computing methods

Discrete distributions

For discrete distributions, there is no universal agreement on selecting the quartile values.

Method 1

# Use the median to divide the ordered data set into two-halves. #* If there is an odd number of data points in the original ordered data set, do not include the median (the central value in the ordered list) in either half. #* If there is an even number of data points in the original ordered data set, split this data set exactly in half. # The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the upper half of the data. This rule is employed by the TI-83 calculator

boxplot In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are cal ...

and "1-Var Stats" functions.

Method 2

# Use the median to divide the ordered data set into two-halves. #* If there are an odd number of data points in the original ordered data set, include the median (the central value in the ordered list) in both halves. #* If there are an even number of data points in the original ordered data set, split this data set exactly in half. # The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the upper half of the data. The values found by this method are also known as " Tukey's hinges"; see also

midhinge In statistics, the midhinge is the average of the first and third quartiles and is thus a measure of location. Equivalently, it is the 25% trimmed mid-range or 25% midsummary; it is an L-estimator. : \operatorname(X) = \overline = \frac = \frac ...

Method 3

# If there are even numbers of data points, then Method 3 starts off the same as Method 1 or Method 2 above and you can choose to include or not include the median as a datapoint. If you choose to include the median as a new datapoint, proceed to step 2 or 3 of Method 3 because you now have an odd number of datapoints. # If there are (4''n''+1) data points, then the lower quartile is 25% of the ''n''th data value plus 75% of the (''n''+1)th data value; the upper quartile is 75% of the (3''n''+1)th data point plus 25% of the (3''n''+2)th data point. # If there are (4''n''+3) data points, then the lower quartile is 75% of the (''n''+1)th data value plus 25% of the (''n''+2)th data value; the upper quartile is 25% of the (3''n''+2)th data point plus 75% of the (3''n''+3)th data point.

Method 4

If we have an ordered dataset

x_1, x_2, ..., x_n

, we can interpolate between data points to find the

p

th empirical

x_i

is in the

i/(n+1)

quantile. If we denote the integer part of a number

a

\lfloor a \rfloor

, then the empirical quantile function is given by,

q(p/4) = x_ + \alpha(x_ - x_)

, where

k = \lfloor p(n+1)/4 \rfloor

and

\alpha = p(n+1)/4 - \lfloor p(n+1)/4 \rfloor

. To find the first, second, and third quartiles of the dataset we would evaluate

q(0.25)

q(0.5)

, and

q(0.75)

respectively.

Example 1

Ordered Data Set: 6, 7, 15, 36, 39, 40, 41, 42, 43, 47, 49

Example 2

Ordered Data Set: 7, 15, 36, 39, 40, 41 As there are an even number of data points, the first three methods all give the same results.

Continuous probability distributions

If we define a continuous probability distributions as

P(X)

where

X

is a real valued random variable, its cumulative distribution function (CDF) is given by,

F_X(x) = P(X \leq x)

. The CDF gives the probability that the random variable

X

is less than the value

x

. Therefore, the first quartile is the value of

x

when

F_X(x) = 0.25

, the second quartile is

x

when

F_X(x) = 0.5

, and the third quartile is

x

when

F_X(x) = 0.75

. The values of

x

can be found with the quantile function

Q(p)

where

p = 0.25

for the first quartile,

p = 0.5

for the second quartile, and

p = 0.75

for the third quartile. The quantile function is the inverse of the cumulative distribution function if the cumulative distribution function is

monotonically increasing In mathematics, a monotonic function (or monotone function) is a function between ordered sets that preserves or reverses the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of orde ...

Outliers

There are methods by which to check for outliers in the discipline of statistics and statistical analysis. Outliers could be a result from a shift in the location (mean) or in the scale (variability) of the process of interest. Outliers could also be evidence of a sample population that has a non-normal distribution or of a contaminated population data set. Consequently, as is the basic idea of descriptive statistics, when encountering an outlier, we have to explain this value by further analysis of the cause or origin of the outlier. In cases of extreme observations, which are not an infrequent occurrence, the typical values must be analyzed. In the case of quartiles, the

Interquartile Range In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the differen ...

(IQR) may be used to characterize the data when there may be extremities that skew the data; the

is a relatively robust statistic (also sometimes called "resistance") compared to the range and standard deviation. There is also a mathematical method to check for outliers and determining "fences", upper and lower limits from which to check for outliers. After determining the first and third quartiles and the interquartile range as outlined above, then fences are calculated using the following formula: :

\text = Q_1 - 1.5(\mathrm) \,

\text = Q_3 + 1.5(\mathrm), \,

where ''Q''₁ and ''Q''₃ are the first and third quartiles, respectively. The lower fence is the "lower limit" and the upper fence is the "upper limit" of data, and any data lying outside these defined bounds can be considered an outlier. Anything below the Lower fence or above the Upper fence can be considered such a case. The fences provide a guideline by which to define an outlier, which may be defined in other ways. The fences define a "range" outside which an outlier exists; a way to picture this is a boundary of a fence, outside which are "outsiders" as opposed to outliers. It is common for the lower and upper fences along with the outliers to be represented by a

. For a boxplot, only the vertical heights correspond to the visualized data set while horizontal width of the box is irrelevant. Outliers located outside the fences in a boxplot can be marked as any choice of symbol, such as an "x" or "o". The fences are sometimes also referred to as "whiskers" while the entire plot visual is called a "box-and-whisker" plot. When spotting an outlier in the data set by calculating the interquartile ranges and boxplot features, it might be simple to mistakenly view it as evidence that the population is non-normal or that the sample is contaminated. However, this method should not take place of a hypothesis test for determining normality of the population. The significance of the outliers vary depending on the sample size. If the sample is small, then it is more probable to get interquartile ranges that are unrepresentatively small, leading to narrower fences. Therefore, it would be more likely to find data that are marked as outliers.

Computer software for quartiles

Excel: The Excel function ''QUARTILE(array, quart)'' provides the desired quartile value for a given array of data, using Method 3 from above. In the ''Quartile'' function, array is the dataset of numbers that is being analyzed and quart is any of the following 5 values depending on which quartile is being calculated. MATLAB: In order to calculate quartiles in Matlab, the function ''quantile(A,p)'' can be used. Where A is the vector of data being analyzed and p is the percentage that relates to the quartiles as stated below.

References

{{reflist

External links

Quartile – from MathWorld
Includes references and compares various methods to compute quartiles

– From MathForum.org
Quartiles calculator
– simple quartiles calculator

– An example how to calculate it Summary statistics 4 (number)

Definitions

Computing methods

Discrete distributions

Method 1

Method 2

Method 3

Method 4

Example 1

Example 2

Continuous probability distributions

Outliers

Computer software for quartiles

See also

References

External links