
In
descriptive statistics, summary statistics are used to summarize a set of
observation
Observation in the natural sciences is an act or instance of noticing or perceiving and the acquisition of information from a primary source. In living beings, observation employs the senses. In science, observation can also involve the percep ...
s, in order to communicate the largest amount of information as simply as possible. Statisticians commonly try to describe the observations in
* a measure of location, or
central tendency, such as the
arithmetic mean
In mathematics and statistics, the arithmetic mean ( ), arithmetic average, or just the ''mean'' or ''average'' is the sum of a collection of numbers divided by the count of numbers in the collection. The collection is often a set of results fr ...
* a measure of
statistical dispersion like the
standard mean absolute deviation
* a measure of the shape of the distribution like
skewness or
kurtosis
* if more than one variable is measured, a measure of
statistical dependence such as a
correlation coefficient
A common collection of
order statistics used as summary statistics are the
five-number summary, sometimes extended to a
seven-number summary In descriptive statistics, the seven-number summary is a collection of seven summary statistics, and is an extension of the five-number summary. There are three similar, common forms.
As with the five-number summary, it can be represented by a m ...
, and the associated
box plot
In descriptive statistics, a box plot or boxplot is a method for demonstrating graphically the locality, spread and skewness groups of numerical data through their quartiles.
In addition to the box on a box plot, there can be lines (which are ca ...
.
Entries in an
analysis of variance
Analysis of variance (ANOVA) is a family of statistical methods used to compare the Mean, means of two or more groups by analyzing variance. Specifically, ANOVA compares the amount of variation ''between'' the group means to the amount of variati ...
table can also be regarded as summary statistics.
Examples
Location
Common measures of location, or
central tendency, are the
arithmetic mean
In mathematics and statistics, the arithmetic mean ( ), arithmetic average, or just the ''mean'' or ''average'' is the sum of a collection of numbers divided by the count of numbers in the collection. The collection is often a set of results fr ...
,
median
The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...
,
mode, and
interquartile mean.
Spread
Common measures of
statistical dispersion are the
standard deviation
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
,
variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
,
range,
interquartile range
In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the differen ...
,
absolute deviation,
mean absolute difference and the
distance standard deviation. Measures that assess spread in comparison to the typical size of data values include the
coefficient of variation.
The
Gini coefficient
In economics, the Gini coefficient ( ), also known as the Gini index or Gini ratio, is a measure of statistical dispersion intended to represent the income distribution, income inequality, the wealth distribution, wealth inequality, or the ...
was originally developed to measure income inequality and is equivalent to one of the
L-moments.
A simple summary of a dataset is sometimes given by quoting particular
order statistics as approximations to selected
percentiles of a distribution.
Shape
Common measures of the shape of a distribution are
skewness or
kurtosis, while alternatives can be based on
L-moments. A different measure is the
distance skewness, for which a value of zero implies central symmetry.
Dependence
The common measure of dependence between paired random variables is the
Pearson product-moment correlation coefficient, while a common alternative summary statistic is
Spearman's rank correlation coefficient. A value of zero for the
distance correlation implies independence.
Human perception of summary statistics
Humans efficiently use summary statistics to quickly perceive the gist of auditory and visual information.
See also
*
Common test statistics
*
Descriptive statistics
*
Sample statistics
*
Sufficient statistic
*
Data processing
Data processing is the collection and manipulation of digital data to produce meaningful information. Data processing is a form of ''information processing'', which is the modification (processing) of information in any manner detectable by an o ...
References
External links
*
{{DEFAULTSORT:Summary Statistics
ja:要約統計量