In
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, quartiles are a type of
quantiles which divide the number of data points into four parts, or ''quarters'', of more-or-less equal size. The data must be ordered from smallest to largest to compute quartiles; as such, quartiles are a form of
order statistic
In statistics, the ''k''th order statistic of a statistical sample is equal to its ''k''th-smallest value. Together with Ranking (statistics), rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and ...
. The three quartiles, resulting in four data divisions, are as follows:
* The first quartile (''Q''
1) is defined as the 25th
percentile
In statistics, a ''k''-th percentile, also known as percentile score or centile, is a score (e.g., a data point) a given percentage ''k'' of all scores in its frequency distribution exists ("exclusive" definition) or a score a given percentage ...
where lowest 25% data is below this point. It is also known as the ''lower'' quartile.
* The second quartile (''Q''
2) is the
median
The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...
of a data set; thus 50% of the data lies below this point.
* The third quartile (''Q''
3) is the 75th percentile where lowest 75% data is below this point. It is known as the ''upper'' quartile, as 75% of the data lies below this point.
Along with the minimum and maximum of the data (which are also quartiles), the three quartiles described above provide a
five-number summary of the data. This summary is important in statistics because it provides information about both the
center and the
spread of the data. Knowing the lower and upper quartile provides information on how big the spread is and if the dataset is
skewed toward one side. Since quartiles divide the number of data points evenly, the
range is generally not the same between adjacent quartiles (i.e. usually (''Q''
3 - ''Q''
2) ≠ (''Q''
2 - ''Q''
1)).
Interquartile range
In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the differen ...
(IQR) is defined as the difference between the 75th and 25th percentiles or ''Q''
3 - ''Q''
1. While the maximum and minimum also show the spread of the data, the upper and lower quartiles can provide more detailed information on the location of specific data points, the presence of
outlier
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s in the data, and the difference in spread between the middle 50% of the data and the outer data points.
Definitions
Computing methods
Discrete distributions
For discrete distributions, there is no universal agreement on selecting the quartile values.
Method 1
# Use the
median
The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...
to divide the ordered data set into two halves. The median becomes the second quartile.
#* If there are an odd number of data points in the original ordered data set, do not include the median (the central value in the ordered list) in either half.
#* If there are an even number of data points in the original ordered data set, split this data set exactly in half.
# The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the upper half of the data.
This rule is employed by the
TI-83 calculator
boxplot and "1-Var Stats" functions.
Method 2
# Use the median to divide the ordered data set into two halves. The median becomes the second quartile.
#* If there are an odd number of data points in the original ordered data set, include the median (the central value in the ordered list) in both halves.
#* If there are an even number of data points in the original ordered data set, split this data set exactly in half.
# The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the upper half of the data.
The values found by this method are also known as "
Tukey's hinges"; see also
midhinge.
Method 3
# Use the median to divide the ordered data set into two halves. The median becomes the second quartile.
#* If there are odd numbers of data points, then go to the next step.
#* If there are even numbers of data points, then the Method 3 starts off the same as the Method 1 or the Method 2 above and you can choose to include or not include the median as a new datapoint. If you choose to include the median as the new datapoint, then proceed to the step 2 or 3 below because you now have an odd number of datapoints. If you do not choose the median as the new data point, then continue the Method 1 or 2 where you have started.
# If there are (4''n''+1) data points, then the lower quartile is 25% of the ''n''th data value plus 75% of the (''n''+1)th data value; the upper quartile is 75% of the (3''n''+1)th data point plus 25% of the (3''n''+2)th data point.
# If there are (4''n''+3) data points, then the lower quartile is 75% of the (''n''+1)th data value plus 25% of the (''n''+2)th data value; the upper quartile is 25% of the (3''n''+2)th data point plus 75% of the (3''n''+3)th data point.
Method 4
If we have an ordered dataset
, then we can interpolate between data points to find the
th empirical
quantile
In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities or dividing the observations in a sample in the same way. There is one fewer quantile t ...
if
is in the
quantile. If we denote the integer part of a number
by
, then the empirical quantile function is given by,
,
is the last data point in quartile ''p'', and
is the first data point in quartile ''p''+1.
measures where the quartile falls between
and
. If
= 0 then the quartile falls exactly on
. If
= 0.5 then the quartile falls exactly half way between
and
.
,
where
and
.
To find the first, second, and third quartiles of the dataset we would evaluate
,
, and
respectively.
Example 1
Ordered Data Set (of an odd number of data points): 6, 7, 15, 36, 39, 40, 41, 42, 43, 47, 49.
The bold number (40) is the median splitting the data set into two halves with equal number of data points.
Example 2
Ordered Data Set (of an even number of data points): 7, 15, 36, 39, 40, 41.
The bold numbers (36, 39) are used to calculate the median as their average. As there are an even number of data points, the first three methods all give the same results. (The Method 3 is executed such that the median is not chosen as a new data point and the Method 1 started.)
Continuous probability distributions

If we define a
continuous probability distributions as
where
is a
real valued random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
, its
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ever ...
(CDF) is given by
.
The
CDF gives the probability that the random variable
is less than or equal to the value
. Therefore, the first quartile is the value of
when
, the second quartile is
when
, and the third quartile is
when
. The values of
can be found with the
quantile function where
for the first quartile,
for the second quartile, and
for the third quartile. The quantile function is the inverse of the cumulative distribution function if the cumulative distribution function is
monotonically increasing because the
one-to-one correspondence
In mathematics, a bijection, bijective function, or one-to-one correspondence is a function between two sets such that each element of the second set (the codomain) is the image of exactly one element of the first set (the domain). Equivale ...
between the input and output of the cumulative distribution function holds.
Outliers
There are methods by which to check for
outliers in the discipline of statistics and statistical analysis. Outliers could be a result from a shift in the location (mean) or in the scale (variability) of the process of interest. Outliers could also be evidence of a sample population that has a non-normal distribution or of a contaminated population data set. Consequently, as is the basic idea of
descriptive statistics
A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and an ...
, when encountering an
outlier
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
, we have to explain this value by further analysis of the cause or origin of the outlier. In cases of extreme observations, which are not an infrequent occurrence, the typical values must be analyzed. The
Interquartile Range
In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the differen ...
(IQR), defined as the difference between the upper and lower quartiles (
), may be used to characterize the data when there may be extremities that skew the data; the
interquartile range
In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the differen ...
is a relatively
robust statistic (also sometimes called "resistance") compared to the
range and
standard deviation
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
. There is also a mathematical method to check for outliers and determining "fences", upper and lower limits from which to check for outliers.
After determining the first (lower) and third (upper) quartiles (
and
respectively) and the interquartile range (
) as outlined above, then fences are calculated using the following formula:
:
:

The lower fence is the "lower limit" and the upper fence is the "upper limit" of data, and any data lying outside these defined bounds can be considered an outlier. The fences provide a guideline by which to define an
outlier
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
, which may be defined in other ways. The fences define a "range" outside which an outlier exists; a way to picture this is a boundary of a fence. It is common for the lower and upper fences along with the outliers to be represented by a
boxplot. For the boxplot shown on the right, only the vertical heights correspond to the visualized data set while horizontal width of the box is irrelevant. Outliers located outside the fences in a boxplot can be marked as any choice of symbol, such as an "x" or "o". The fences are sometimes also referred to as "whiskers" while the entire plot visual is called a "box-and-whisker" plot.
When spotting an outlier in the data set by calculating the interquartile ranges and boxplot features, it might be easy to mistakenly view it as evidence that the population is non-normal or that the sample is contaminated. However, this method should not take place of a
hypothesis test for determining normality of the population. The significance of the outliers varies depending on the sample size. If the sample is small, then it is more probable to get interquartile ranges that are unrepresentatively small, leading to narrower fences. Therefore, it would be more likely to find data that are marked as outliers.
Computer software for quartiles
Excel
The Excel function ''QUARTILE.INC(array, quart)'' provides the desired quartile value for a given array of data, using Method 3 from above. The ''QUARTILE'' function is a legacy function from Excel 2007 or earlier, giving the same output of the function ''QUARTILE.INC''. In the function, ''array'' is the dataset of numbers that is being analyzed and ''quart'' is any of the following 5 values depending on which quartile is being calculated.
MATLAB
In order to calculate quartiles in Matlab, the function ''quantile''(''A'',''p'') can be used. Where ''A'' is the vector of data being analyzed and ''p'' is the percentage that relates to the quartiles as stated below.
See also
*
Five-number summary
*
Range
*
Box plot
In descriptive statistics, a box plot or boxplot is a method for demonstrating graphically the locality, spread and skewness groups of numerical data through their quartiles.
In addition to the box on a box plot, there can be lines (which are ca ...
*
Interquartile range
In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the differen ...
*
Summary statistics
*
Quantile
In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities or dividing the observations in a sample in the same way. There is one fewer quantile t ...
References
{{reflist
External links
Quartile – from MathWorldIncludes references and compares various methods to compute quartiles
– From MathForum.org
– An example how to calculate it
Summary statistics
4 (number)