In statistics, a trimmed estimator is an

estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...

derived from another estimator by excluding some of the extreme values, a process called truncation. This is generally done to obtain a more robust statistic, and the extreme values are considered

outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...

s. Trimmed estimators also often have higher efficiency for

mixture distribution In probability and statistics, a mixture distribution is the probability distribution of a random variable that is derived from a collection of other random variables as follows: first, a random variable is selected by chance from the collection ...

s and

heavy-tailed distribution In probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded: that is, they have heavier tails than the exponential distribution. In many applications it is the right tail of the distrib ...

s than the corresponding untrimmed estimator, at the cost of lower efficiency for other distributions, such as the

normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu i ...

. Given an estimator, the x% trimmed version is obtained by discarding the x% lowest or highest observations or on both end: it is a statistic on the ''middle'' of the data. For instance, the 5% trimmed mean is obtained by taking the mean of the 5% to 95% range. In some cases a trimmed estimator discards a fixed number of points (such as maximum and minimum) instead of a percentage.

Examples

The median is the most trimmed statistic (nominally 50%), as it discards all but the most central data, and equals the fully trimmed mean – or indeed fully trimmed mid-range, or (for odd-size data sets) the fully trimmed maximum or minimum. Likewise, no degree of trimming has any effect on the median – a trimmed median is the median – because trimming always excludes an equal number of the lowest and highest values.

Quantiles In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile ...

can be thought of as trimmed maxima or minima: for instance, the 5th

percentile In statistics, a ''k''-th percentile (percentile score or centile) is a score ''below which'' a given percentage ''k'' of scores in its frequency distribution falls (exclusive definition) or a score ''at or below which'' a given percentage falls ...

is the 5% trimmed minimum. Trimmed estimators used to estimate a

location parameter In geography, location or place are used to denote a region (point, line, or area) on Earth's surface or elsewhere. The term ''location'' generally implies a higher degree of certainty than ''place'', the latter often indicating an entity with an ...

include: * Trimmed mean *

Modified mean A truncated mean or trimmed mean is a statistical measure of central tendency, much like the mean and median. It involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end, an ...

, discarding the minimum and maximum values * Interquartile mean, the 25% trimmed mean * Midhinge, the 25% trimmed

mid-range In statistics, the mid-range or mid-extreme is a measure of central tendency of a sample defined as the arithmetic mean of the maximum and minimum values of the data set: :M=\frac. The mid-range is closely related to the range, a measure of ...

Trimmed estimators used to estimate a

scale parameter In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions. The larger the scale parameter, the more spread out the distribution. Definition If a family o ...

include: *

Interquartile range In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the difference ...

, the 25% trimmed range * Interdecile range, the 10% trimmed range Trimmed estimators involving only linear combinations of points are examples of

L-estimator In statistics, an L-estimator is an estimator which is a linear combination of order statistics of the measurements (which is also called an L-statistic). This can be as little as a single point, as in the median (of an odd number of values), or a ...

Applications

Estimation

Most often, trimmed estimators are used for

parameter estimation Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their val ...

of the same parameter as the untrimmed estimator. In some cases the estimator can be used directly, while in other cases it must be adjusted to yield an unbiased

consistent estimator In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter ''θ''0—having the property that as the number of data points used increases indefinitely, the resul ...

. For example, when estimating a

for a symmetric distribution, a trimmed estimator will be unbiased (assuming the original estimator was unbiased), as it removes the same amount above and below. However, if the distribution has skew, trimmed estimators will generally be biased and require adjustment. For example, in a skewed distribution, the

nonparametric skew In statistics and probability theory, the nonparametric skew is a statistic occasionally used with random variables that take real values.Arnold BC, Groeneveld RA (1995) Measuring skewness with respect to the mode. The American Statistician 49 ( ...

(and Pearson's skewness coefficients) measure the bias of the median as an estimator of the mean. When estimating a

, using a trimmed estimator as a

robust measures of scale In statistics, robust measures of scale are methods that quantify the statistical dispersion in a sample of numerical data while resisting outliers. The most common such robust statistics are the ''interquartile range'' (IQR) and the ''median absol ...

, such as to estimate the population variance or population standard deviation, one generally must multiply by a scale factor to make it an unbiased consistent estimator; see scale parameter: estimation. For example, dividing the IQR by

2\sqrt \operatorname^(1/2) \approx 1.349

(using the

error function In mathematics, the error function (also called the Gauss error function), often denoted by , is a complex function of a complex variable defined as: :\operatorname z = \frac\int_0^z e^\,\mathrm dt. This integral is a special (non- elementa ...

) makes it an unbiased, consistent estimator for the population standard deviation if the data follow a

Other uses

Trimmed estimators can also be used as statistics in their own right – for example, the median is a measure of location, and the IQR is a measure of dispersion. In these cases, the sample statistics can act as estimators of their own

expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...

. For example, the MAD of a sample from a standard

Cauchy distribution The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) fu ...

is an estimator of the population MAD, which in this case is 1, whereas the population variance does not exist.

References

{{More references, date=April 2013 Estimator Robust statistics

Examples

Applications

Estimation

Other uses

See also

References