In
statistics, a trimmed estimator is an
estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
derived from another estimator by excluding some of the
extreme values, a process called
truncation. This is generally done to obtain a more
robust statistic, and the extreme values are considered
outlier
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s.
Trimmed estimators also often have higher
efficiency for
mixture distribution
In probability and statistics, a mixture distribution is the probability distribution of a random variable that is derived from a collection of other random variables as follows: first, a random variable is selected by chance from the collection ...
s and
heavy-tailed distribution
In probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded: that is, they have heavier tails than the exponential distribution. In many applications it is the right tail of the distrib ...
s than the corresponding untrimmed estimator, at the cost of lower efficiency for other distributions, such as the
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu i ...
.
Given an estimator, the x% trimmed version is obtained by discarding the x% lowest or highest observations or on both end: it is a statistic on the ''middle'' of the data. For instance, the 5%
trimmed mean is obtained by taking the mean of the 5% to 95% range. In some cases a trimmed estimator discards a fixed number of points (such as maximum and minimum) instead of a percentage.
Examples
The
median is the most trimmed statistic (nominally 50%), as it discards all but the most central data, and equals the fully trimmed mean – or indeed fully trimmed mid-range, or (for odd-size data sets) the fully trimmed maximum or minimum. Likewise, no degree of trimming has any effect on the median – a trimmed median is the median – because trimming always excludes an equal number of the lowest and highest values.
Quantiles
In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile ...
can be thought of as trimmed maxima or minima: for instance, the 5th
percentile
In statistics, a ''k''-th percentile (percentile score or centile) is a score ''below which'' a given percentage ''k'' of scores in its frequency distribution falls (exclusive definition) or a score ''at or below which'' a given percentage falls ...
is the 5% trimmed minimum.
Trimmed estimators used to estimate a
location parameter
In geography, location or place are used to denote a region (point, line, or area) on Earth's surface or elsewhere. The term ''location'' generally implies a higher degree of certainty than ''place'', the latter often indicating an entity with an ...
include:
*
Trimmed mean
*
Modified mean
A truncated mean or trimmed mean is a statistical measure of central tendency, much like the mean and median. It involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end, an ...
, discarding the minimum and maximum values
*
Interquartile mean, the 25%
trimmed mean
*
Midhinge, the 25% trimmed
mid-range
In statistics, the mid-range or mid-extreme is a measure of central tendency of a sample defined as the arithmetic mean of the maximum and minimum values of the data set:
:M=\frac.
The mid-range is closely related to the range, a measure of ...
Trimmed estimators used to estimate a
scale parameter
In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions. The larger the scale parameter, the more spread out the distribution.
Definition
If a family o ...
include:
*
Interquartile range
In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the difference ...
, the 25% trimmed
range
*
Interdecile range, the 10% trimmed range
Trimmed estimators involving only linear combinations of points are examples of
L-estimator
In statistics, an L-estimator is an estimator which is a linear combination of order statistics of the measurements (which is also called an L-statistic). This can be as little as a single point, as in the median (of an odd number of values), or a ...
s.
Applications
Estimation
Most often, trimmed estimators are used for
parameter estimation
Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their val ...
of the same parameter as the untrimmed estimator. In some cases the estimator can be used directly, while in other cases it must be adjusted to yield an
unbiased consistent estimator
In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter ''θ''0—having the property that as the number of data points used increases indefinitely, the resul ...
.
For example, when estimating a
location parameter
In geography, location or place are used to denote a region (point, line, or area) on Earth's surface or elsewhere. The term ''location'' generally implies a higher degree of certainty than ''place'', the latter often indicating an entity with an ...
for a symmetric distribution, a trimmed estimator will be unbiased (assuming the original estimator was unbiased), as it removes the same amount above and below. However, if the distribution has
skew, trimmed estimators will generally be biased and require adjustment. For example, in a skewed distribution, the
nonparametric skew
In statistics and probability theory, the nonparametric skew is a statistic occasionally used with random variables that take real values.Arnold BC, Groeneveld RA (1995) Measuring skewness with respect to the mode. The American Statistician 49 ( ...
(and
Pearson's skewness coefficients) measure the bias of the median as an estimator of the mean.
When estimating a
scale parameter
In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions. The larger the scale parameter, the more spread out the distribution.
Definition
If a family o ...
, using a trimmed estimator as a
robust measures of scale In statistics, robust measures of scale are methods that quantify the statistical dispersion in a sample of numerical data while resisting outliers. The most common such robust statistics are the ''interquartile range'' (IQR) and the ''median absol ...
, such as to estimate the
population variance or population
standard deviation, one generally must multiply by a
scale factor to make it an unbiased consistent estimator; see
scale parameter: estimation.
For example, dividing the IQR by
(using the
error function
In mathematics, the error function (also called the Gauss error function), often denoted by , is a complex function of a complex variable defined as:
:\operatorname z = \frac\int_0^z e^\,\mathrm dt.
This integral is a special (non- elementa ...
) makes it an unbiased, consistent estimator for the population standard deviation if the data follow a
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu i ...
.
Other uses
Trimmed estimators can also be used as statistics in their own right – for example, the median is a measure of location, and the IQR is a measure of dispersion. In these cases, the sample statistics can act as estimators of their own
expected value
In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
. For example, the
MAD of a sample from a standard
Cauchy distribution
The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) fu ...
is an estimator of the population MAD, which in this case is 1, whereas the population variance does not exist.
See also
*
Winsorising Winsorizing or winsorization is the transformation of statistics by limiting extreme values in the statistical data to reduce the effect of possibly spurious outliers. It is named after the engineer-turned-biostatistician Charles P. Winsor (1895� ...
, a related technique
*
Core inflation, an economic statistic that omits volatile components
References
{{More references, date=April 2013
Estimator
Robust statistics