
The median of a set of numbers is the value separating the higher half from the lower half of a
data sample, a
population
Population is a set of humans or other organisms in a given region or area. Governments conduct a census to quantify the resident population size within a given jurisdiction. The term is also applied to non-human animals, microorganisms, and pl ...
, or a
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
. For a
data set
A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more table (database), database tables, where every column (database), column of a table represents a particular Variable (computer sci ...
, it may be thought of as the “middle" value. The basic feature of the median in describing data compared to the
mean
A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
(often simply described as the "average") is that it is not
skewed
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.
For a unimodal ...
by a small proportion of extremely large or small values, and therefore provides a better representation of the center.
Median income
The median income is the income amount that divides a population into two groups, half having an income above that amount, and half having an income below that amount. It may differ from the mean (or average) income. Both of these are ways of unde ...
, for example, may be a better way to describe the center of the income distribution because increases in the largest incomes alone have no effect on the median. For this reason, the median is of central importance in
robust statistics
Robust statistics are statistics that maintain their properties even if the underlying distributional assumptions are incorrect. Robust Statistics, statistical methods have been developed for many common problems, such as estimating location parame ...
.
Median is a 2-
quantile
In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities or dividing the observations in a sample in the same way. There is one fewer quantile t ...
; it is the value that partitions a set into two equal parts.
Finite set of numbers
The median of a finite list of numbers is the "middle" number, when those numbers are listed in order from smallest to greatest.
If the data set has an odd number of observations, the middle one is selected (after arranging in ascending order). For example, the following list of seven numbers,
has the median of ''6'', which is the fourth value.
If the data set has an even number of observations, there is no distinct middle value and the median is usually defined to be the
arithmetic mean
In mathematics and statistics, the arithmetic mean ( ), arithmetic average, or just the ''mean'' or ''average'' is the sum of a collection of numbers divided by the count of numbers in the collection. The collection is often a set of results fr ...
of the two middle values.
For example, this data set of 8 numbers
has a median value of ''4.5'', that is
. (In more technical terms, this interprets the median as the fully
trimmed mid-range
In statistics, the mid-range or mid-extreme is a measure of central tendency of a sample defined as the arithmetic mean of the maximum and minimum values of the data set:
:M=\frac.
The mid-range is closely related to the range, a measure of ...
).
In general, with this convention, the median can be defined as follows: For a data set
of
elements, ordered from smallest to greatest,
Definition and notation
Formally, a median of a
population
Population is a set of humans or other organisms in a given region or area. Governments conduct a census to quantify the resident population size within a given jurisdiction. The term is also applied to non-human animals, microorganisms, and pl ...
is any value such that at least half of the population is less than or equal to the proposed median and at least half is greater than or equal to the proposed median. As seen above, medians may not be unique. If each set contains more than half the population, then some of the population is exactly equal to the unique median.
The median is well-defined for any
ordered (one-dimensional) data and is independent of any
distance metric. The median can thus be applied to school classes which are ranked but not numerical (e.g. working out a median grade when student test scores are graded from F to A), although the result might be halfway between classes if there is an even number of classes. (For odd number classes, one specific class is determined as the median.)
A
geometric median
In geometry, the geometric median of a discrete point set in a Euclidean space is the point minimizing the sum of distances to the sample points. This generalizes the median, which has the property of minimizing the sum of distances or absolute ...
, on the other hand, is defined in any number of dimensions. A related concept, in which the outcome is forced to correspond to a member of the sample, is the
medoid.
There is no widely accepted standard notation for the median, but some authors represent the median of a variable ''x'' as med(''x''), ''x͂'',
as ''μ''
1/2,
or as ''M''.
In any of these cases, the use of these or other symbols for the median needs to be explicitly defined when they are introduced.
The median is a special case of other
ways of summarizing the typical values associated with a statistical distribution: it is the 2nd
quartile
In statistics, quartiles are a type of quantiles which divide the number of data points into four parts, or ''quarters'', of more-or-less equal size. The data must be ordered from smallest to largest to compute quartiles; as such, quartiles are ...
, 5th
decile, and 50th
percentile
In statistics, a ''k''-th percentile, also known as percentile score or centile, is a score (e.g., a data point) a given percentage ''k'' of all scores in its frequency distribution exists ("exclusive" definition) or a score a given percentage ...
.
Uses
The median can be used as a measure of
location
In geography, location or place is used to denote a region (point, line, or area) on Earth's surface. The term ''location'' generally implies a higher degree of certainty than ''place'', the latter often indicating an entity with an ambiguous bou ...
when one attaches reduced importance to extreme values, typically because a distribution is
skewed
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.
For a unimodal ...
, extreme values are not known, or
outlier
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s are untrustworthy, i.e., may be measurement or transcription errors.
For example, consider the
multiset
In mathematics, a multiset (or bag, or mset) is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements. The number of instances given for each element is called the ''multiplicity'' of ...
The median is 2 in this case, as is the
mode, and it might be seen as a better indication of the
center than the
arithmetic mean
In mathematics and statistics, the arithmetic mean ( ), arithmetic average, or just the ''mean'' or ''average'' is the sum of a collection of numbers divided by the count of numbers in the collection. The collection is often a set of results fr ...
of 4, which is larger than all but one of the values. However, the widely cited empirical relationship that the mean is shifted "further into the tail" of a distribution than the median is not generally true. At most, one can say that the two statistics cannot be "too far" apart; see below.
As a median is based on the middle data in a set, it is not necessary to know the value of extreme results in order to calculate it. For example, in a psychology test investigating the time needed to solve a problem, if a small number of people failed to solve the problem at all in the given time a median can still be calculated.
Because the median is simple to understand and easy to calculate, while also a robust approximation to the
mean
A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
, the median is a popular
summary statistic in
descriptive statistics
A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and an ...
. In this context, there are several choices for a measure of
variability: the
range, the
interquartile range
In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the differen ...
, the
mean absolute deviation, and the
median absolute deviation.
For practical purposes, different measures of location and dispersion are often compared on the basis of how well the corresponding population values can be estimated from a sample of data. The median, estimated using the sample median, has good properties in this regard. While it is not usually optimal if a given population distribution is assumed, its properties are always reasonably good. For example, a comparison of the
efficiency
Efficiency is the often measurable ability to avoid making mistakes or wasting materials, energy, efforts, money, and time while performing a task. In a more general sense, it is the ability to do things well, successfully, and without waste.
...
of candidate estimators shows that the sample mean is more statistically efficient
when—and only when— data is uncontaminated by data from heavy-tailed distributions or from mixtures of distributions. Even then, the median has a 64% efficiency compared to the minimum-variance mean (for large normal samples), which is to say the variance of the median will be ~50% greater than the variance of the mean.
Probability distributions
For any
real-valued
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
with
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ever ...
''F'', a median is defined as any real number ''m'' that satisfies the inequalities
(cf. the
drawing
Drawing is a Visual arts, visual art that uses an instrument to mark paper or another two-dimensional surface, or a digital representation of such. Traditionally, the instruments used to make a drawing include pencils, crayons, and ink pens, some ...
in the
definition of expected value for arbitrary real-valued random variables). An equivalent phrasing uses a random variable ''X'' distributed according to ''F'':
Note that this definition does not require ''X'' to have an
absolutely continuous distribution (which has a
probability density function
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
''f''), nor does it require a
discrete one. In the former case, the inequalities can be upgraded to equality: a median satisfies
and
Any
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
on the real number set
has at least one median, but in pathological cases there may be more than one median: if ''F'' is constant 1/2 on an interval (so that ''f'' = 0 there), then any value of that interval is a median.
Medians of particular distributions
The medians of certain types of distributions can be easily calculated from their parameters; furthermore, they exist even for some distributions lacking a well-defined mean, such as the
Cauchy distribution
The Cauchy distribution, named after Augustin-Louis Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) ...
:
* The median of a symmetric
unimodal distribution
In mathematics, unimodality means possessing a unique mode (statistics), mode. More generally, unimodality means there is only a single highest value, somehow defined, of some mathematical object.
Unimodal probability distribution
In statis ...
coincides with the mode.
* The median of a
symmetric distribution
In statistics, a symmetric probability distribution is a probability distribution—an assignment of probabilities to possible occurrences—which is unchanged when its probability density function (for continuous probability distribution) or pro ...
which possesses a mean ''μ'' also takes the value ''μ''.
** The median of a
normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
with mean ''μ'' and variance ''σ''
2 is μ. In fact, for a normal distribution, mean = median = mode.
** The median of a
uniform distribution in the interval
'a'', ''b''is (''a'' + ''b'') / 2, which is also the mean.
* The median of a
Cauchy distribution
The Cauchy distribution, named after Augustin-Louis Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) ...
with location parameter ''x''
0 and scale parameter ''y'' is ''x''
0, the location parameter.
* The median of a
power law distribution ''x''
−''a'', with exponent ''a'' > 1 is 2
1/(''a'' − 1)''x''
min, where ''x''
min is the minimum value for which the power law holds
* The median of an
exponential distribution
In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuousl ...
with
rate parameter ''λ'' is the
natural logarithm
The natural logarithm of a number is its logarithm to the base of a logarithm, base of the e (mathematical constant), mathematical constant , which is an Irrational number, irrational and Transcendental number, transcendental number approxima ...
of 2 divided by the rate parameter: ''λ''
−1ln 2.
* The median of a
Weibull distribution with shape parameter ''k'' and scale parameter ''λ'' is ''λ''(ln 2)
1/''k''.
Properties
Optimality property
The ''
mean absolute error'' of a real variable ''c'' with respect to the
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
''X'' is