In
descriptive statistics
A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and a ...
, the seven-number summary is a collection of seven
summary statistics
In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. Statisticians commonly try to describe the observations in
* a measure of ...
, and is an extension of the
five-number summary. There are three similar, common forms.
As with the five-number summary, it can be represented by a modified
box plot
In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are cal ...
, adding hatch-marks on the "whiskers" for two of the additional numbers.
Seven-number summary
The following
percentiles
In statistics, a ''k''-th percentile (percentile score or centile) is a score ''below which'' a given percentage ''k'' of scores in its frequency distribution falls (exclusive definition) or a score ''at or below which'' a given percentage falls ...
are (approximately) evenly spaced under a
normally distributed
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu is ...
variable:
# the 2nd
percentile
In statistics, a ''k''-th percentile (percentile score or centile) is a score ''below which'' a given percentage ''k'' of scores in its frequency distribution falls (exclusive definition) or a score ''at or below which'' a given percentage falls ...
(better: 2.15%)
# the 9th percentile (better: 8.87%)
# the 25th percentile or
lower quartile or ''first quartile''
# the 50th percentile or
median (middle value, or ''second quartile'')
# the 75th percentile or
upper quartile or ''third quartile''
# the 91st percentile (better: 91.13%)
# the 98th percentile (better: 97.85%)
The middle three values – the
lower quartile,
median, and
upper quartile – are the usual statistics from the
five-number summary and are the standard values for the box in a
box plot
In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are cal ...
.
The two unusual percentiles at either end are used because the locations of all seven values will be approximately equally spaced if the data is
normally distributed
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu is ...
Some statistical tests require
normally distributed data, so the plotted values provide a convenient visual check for validity of later tests, simply by scanning to see if the marks for those seven percentiles appear to be equal distances apart on the graph.
Notice that whereas the extreme values of the
five-number summary depend on the number of samples, this seven-number summary does not, and is somewhat more stable, since its whisker-ends are protected from the usual wild swings in the extreme values of the sample by replacing them with the more steady 2nd and 98th percentiles.
The values can be represented using a modified
box plot
In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are cal ...
. The 2nd and 98th percentiles are represented by the ends of the whiskers, and hatch-marks across the whiskers mark the 9th and 91st percentiles.
Bowley’s seven-figure summary
Arthur Bowley used a set of
non-parametric statistics
Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distri ...
, called a "seven-figure summary", including the extremes,
decile
In descriptive statistics, a decile is any of the nine values that divide the sorted data into ten equal parts, so that each part represents 1/10 of the sample or population. A decile is one possible form of a quantile; others include the quartil ...
s, and
quartile
In statistics, a quartile is a type of quantile which divides the number of data points into four parts, or ''quarters'', of more-or-less equal size. The data must be ordered from smallest to largest to compute quartiles; as such, quartiles are ...
s, along with the median.
Thus the numbers are:
# the
sample minimum
# the 10th percentile (first
decile
In descriptive statistics, a decile is any of the nine values that divide the sorted data into ten equal parts, so that each part represents 1/10 of the sample or population. A decile is one possible form of a quantile; others include the quartil ...
)
# the 25th percentile or
lower quartile or ''first quartile''
# the 50th percentile or
median (middle value, or ''second quartile'')
# the 75th percentile or
upper quartile or ''third quartile''
# the 90th percentile (last
decile
In descriptive statistics, a decile is any of the nine values that divide the sorted data into ten equal parts, so that each part represents 1/10 of the sample or population. A decile is one possible form of a quantile; others include the quartil ...
)
# the
sample maximum
In statistics, the sample maximum and sample minimum, also called the largest observation and smallest observation, are the values of the greatest and least elements of a sample. They are basic summary statistics, used in descriptive statisti ...
Note that the middle five of the seven numbers are very nearly the same as for the seven number summary, above.
The addition of the deciles allow one to compute the
interdecile range, which for a normal distribution can be scaled to give a reasonably efficient estimate of standard deviation, and the 10%
midsummary, which when compared to the median gives an idea of the
skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.
For a unimo ...
in the tails.
Tukey’s seven-number summary
John Tukey
John Wilder Tukey (; June 16, 1915 – July 26, 2000) was an American mathematician and statistician, best known for the development of the fast Fourier Transform (FFT) algorithm and box plot. The Tukey range test, the Tukey lambda distributi ...
used a seven-number summary consisting of the extremes,
octiles,
quartile
In statistics, a quartile is a type of quantile which divides the number of data points into four parts, or ''quarters'', of more-or-less equal size. The data must be ordered from smallest to largest to compute quartiles; as such, quartiles are ...
s, and the median.
The seven numbers are:
# the
sample minimum
# the 12.5th percentile (first
octile)
# the 25th percentile or
lower quartile or ''first quartile''
# the 50th percentile or
median (middle value, or ''second quartile'')
# the 75th percentile or
upper quartile or ''third quartile''
# the 87.5th percentile (last
octile)
# the
sample maximum
In statistics, the sample maximum and sample minimum, also called the largest observation and smallest observation, are the values of the greatest and least elements of a sample. They are basic summary statistics, used in descriptive statisti ...
Note that the middle five of the seven numbers can all be obtained by successive partitioning of the ordered data into subsets of equal size. Extending the seven-number summary by continued partitioning produces the ''nine-number summary'', the ''eleven-number summary'', and so on.
See also
*
Three-point estimation
*
Stanine
Stanine (STAndard NINE) is a method of scaling test scores on a nine-point standard scale with a mean of five and a standard deviation of two.
Some web sources attribute stanines to the U.S. Army Air Forces during World War II. Psychometric le ...
Footnotes
References
{{reflist
Summary statistics