The five-number summary is a set of
descriptive statistics that provides information about a dataset. It consists of the five most important sample
percentiles:
# the
sample minimum ''(smallest observation)''
# the
lower quartile or ''first quartile''
# the
median
In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
(the middle value)
# the
upper quartile or ''third quartile''
# the
sample maximum (largest observation)
In addition to the median of a single set of data there are two related statistics called the upper and lower quartiles. If data are placed in order, then the lower quartile is central to the lower half of the data and the upper quartile is central to the upper half of the data. These quartiles are used to calculate the interquartile range, which helps to describe the spread of the data, and determine whether or not any data points are outliers.
In order for these statistics to exist the observations must be from a
univariate variable that can be measured on an ordinal, interval or ratio
scale
Scale or scales may refer to:
Mathematics
* Scale (descriptive set theory), an object defined on a set of points
* Scale (ratio), the ratio of a linear dimension of a model to the corresponding dimension of the original
* Scale factor, a number ...
.
Use and representation
The five-number summary provides a concise summary of the
distribution of the observations. Reporting five numbers avoids the need to decide on the most appropriate summary statistic. The five-number summary gives information about the location (from the median), spread (from the quartiles) and range (from the sample minimum and maximum) of the observations. Since it reports
order statistics (rather than, say, the mean) the five-number summary is appropriate for
ordinal measurements, as well as interval and ratio measurements.
It is possible to quickly compare several sets of observations by comparing their five-number summaries, which can be represented graphically using a
boxplot
In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are cal ...
.
In addition to the points themselves, many
L-estimators can be computed from the five-number summary, including
interquartile range,
midhinge,
range,
mid-range, and
trimean.
The five-number summary is sometimes represented as in the following table:
Example
This example calculates the five-number summary for the following set of observations: 0, 0, 1, 2, 63, 61, 27, 13.
These are the number of moons of each planet in the
Solar System.
It helps to put the observations in ascending order: 0, 0, 1, 2, 13, 27, 61, 63. There are eight observations, so the median is the mean of the two middle numbers, (2 + 13)/2 = 7.5. Splitting the observations either side of the median gives two groups of four observations. The median of the first group is the lower or first quartile, and is equal to (0 + 1)/2 = 0.5. The median of the second group is the upper or third quartile, and is equal to (27 + 61)/2 = 44.
The smallest and largest observations are 0 and 63.
So the five-number summary would be 0, 0.5, 7.5, 44, 63.
Example in R
It is possible to calculate the five-number summary in the
R programming language using the
fivenum
function. The
summary
function, when applied to a vector, displays the five-number summary together with the mean (which is not itself a part of the five-number summary). The
fivenum
uses a different method to calculate percentiles than the
summary
function.
Example in Python
This python example uses the
percentile
function from the numerical library
numpy
and works in Python 2 and 3.
import numpy as np
def fivenum(data):
"""Five-number summary."""
return np.percentile(data, , 25, 50, 75, 100
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline ...
method='midpoint')
>>> moons = , 0, 1, 2, 63, 61, 27, 13>>> print(fivenum(moons))
0. 0.5 7.5 44. 63.
Example in SAS
You can use
PROC UNIVARIATE
in
SAS (software) to get the five number summary:
data fivenum;
input x @@;
datalines;
1 2 3 4 20 202 392 4 38 20
;
run;
ods select Quantiles;
proc univariate data = fivenum;
output out = fivenums min = min Q1 = Q1 Q2 = median Q3 = Q3 max = max;
run;
proc print data = fivenums;
run;
Example in Stata
input byte y
0
0
1
2
63
61
27
13
end
list
tabstat y, statistics (min q max)
See also
*
Seven-number summary
*
Three-point estimation
*
Box plot
References
*
*
{{refend
Summary statistics
Articles with example Python (programming language) code
Articles with example R code