In
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, a trimmed estimator is an
estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on Sample (statistics), observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguish ...
derived from another estimator by excluding some of the
extreme values, a process called
truncation
In mathematics and computer science, truncation is limiting the number of digits right of the decimal point.
Truncation and floor function
Truncation of positive real numbers can be done using the floor function. Given a number x \in \mathbb ...
. This is generally done to obtain a more
robust statistic, and the extreme values are considered
outlier
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s.
Trimmed estimators also often have higher
efficiency
Efficiency is the often measurable ability to avoid making mistakes or wasting materials, energy, efforts, money, and time while performing a task. In a more general sense, it is the ability to do things well, successfully, and without waste.
...
for
mixture distributions, and
heavy-tailed distributions than the corresponding untrimmed estimator, at the cost of lower efficiency for other distributions, such as the
normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
.
Given an estimator, the x% trimmed version is obtained by discarding the x% lowest or highest observations or on both end: it is a statistic on the ''middle'' of the data. For instance, the 5%
trimmed mean is obtained by taking the mean of the 5% to 95% range. In some cases a trimmed estimator discards a fixed number of points (such as maximum and minimum) instead of a percentage.
Examples
The
median
The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...
is the most trimmed statistic (nominally 50%), as it discards all but the most central data, and equals the fully trimmed mean – or indeed fully trimmed mid-range, or (for odd-size data sets) the fully trimmed maximum or minimum. Likewise, no degree of trimming has any effect on the median – a trimmed median is the median – because trimming always excludes an equal number of the lowest and highest values.
Quantiles can be thought of as trimmed maxima or minima: for instance, the 5th
percentile
In statistics, a ''k''-th percentile, also known as percentile score or centile, is a score (e.g., a data point) a given percentage ''k'' of all scores in its frequency distribution exists ("exclusive" definition) or a score a given percentage ...
is the 5% trimmed minimum.
Trimmed estimators used to estimate a
location parameter
In statistics, a location parameter of a probability distribution is a scalar- or vector-valued parameter x_0, which determines the "location" or shift of the distribution. In the literature of location parameter estimation, the probability distr ...
include:
*
Trimmed mean
*
Modified mean, discarding the minimum and maximum values
*
Interquartile mean, the 25%
trimmed mean
*
Midhinge, the 25% trimmed
mid-range
In statistics, the mid-range or mid-extreme is a measure of central tendency of a sample defined as the arithmetic mean of the maximum and minimum values of the data set:
:M=\frac.
The mid-range is closely related to the range, a measure of ...
Trimmed estimators used to estimate a
scale parameter include:
*
Interquartile range
In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the differen ...
, the 25% trimmed
range
*
Interdecile range, the 10% trimmed range
Trimmed estimators involving only linear combinations of points are examples of
L-estimator
In statistics, an L-estimator (or L-statistic) is an estimator which is a linear combination of order statistics of the measurements. This can be as little as a single point, as in the median (of an odd number of values), or as many as all points ...
s.
Applications
Estimation
Most often, trimmed estimators are used for
parameter estimation of the same parameter as the untrimmed estimator. In some cases the estimator can be used directly, while in other cases it must be adjusted to yield an
unbiased
Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is inaccurate, closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individ ...
consistent estimator
In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter ''θ''0—having the property that as the number of data points used increases indefinitely, the result ...
.
For example, when estimating a
location parameter
In statistics, a location parameter of a probability distribution is a scalar- or vector-valued parameter x_0, which determines the "location" or shift of the distribution. In the literature of location parameter estimation, the probability distr ...
for a symmetric distribution, a trimmed estimator will be unbiased (assuming the original estimator was unbiased), as it removes the same amount above and below. However, if the distribution has
skew, trimmed estimators will generally be biased and require adjustment. For example, in a skewed distribution, the
nonparametric skew (and
Pearson's skewness coefficients) measure the bias of the median as an estimator of the mean.
When estimating a
scale parameter, using a trimmed estimator as a
robust measures of scale, such as to estimate the
population variance or population
standard deviation
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
, one generally must multiply by a
scale factor to make it an unbiased consistent estimator; see
scale parameter: estimation.
For example, dividing the IQR by
(using the
error function
In mathematics, the error function (also called the Gauss error function), often denoted by , is a function \mathrm: \mathbb \to \mathbb defined as:
\operatorname z = \frac\int_0^z e^\,\mathrm dt.
The integral here is a complex Contour integrat ...
) makes it an unbiased, consistent estimator for the population standard deviation if the data follow a
normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
.
Other uses
Trimmed estimators can also be used as statistics in their own right – for example, the median is a measure of location, and the IQR is a measure of dispersion. In these cases, the sample statistics can act as estimators of their own
expected value
In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...
. For example, the
MAD of a sample from a standard
Cauchy distribution
The Cauchy distribution, named after Augustin-Louis Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) ...
is an estimator of the population MAD, which in this case is 1, whereas the population variance does not exist.
See also
*
Winsorising, a related technique
*
Core inflation, an economic statistic that omits volatile components
References
{{More references, date=April 2013
Estimator
Robust statistics