Robust Measures Of Scale

	Robust Measures Of Scale In statistics, robust measures of scale are methods which quantify the statistical dispersion in a sample of numerical data while resisting outliers. These are contrasted with conventional or non-robust measures of scale, such as sample standard deviation, which are greatly influenced by outliers. The most common such robust statistics are the ''interquartile range'' (IQR) and the '' median absolute deviation'' (MAD). Alternatives robust estimators have also been developed, such as those based on pairwise differences and biweight midvariance. These robust statistics are particularly used as estimators of a scale parameter, and have the advantages of both robustness and superior efficiency on contaminated data, at the cost of inferior efficiency on clean data from distributions such as the normal distribution. To illustrate robustness, the standard deviation can be made arbitrarily large by increasing exactly one observation (it has a breakdown point of 0, as it can be contaminat ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Statistical Dispersion In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile range. For instance, when the variance of data in a set is large, the data is widely scattered. On the other hand, when the variance is small, the data in the set is clustered. Dispersion is contrasted with location or central tendency, and together they are the most used properties of distributions. Measures of statistical dispersion A measure of statistical dispersion is a nonnegative real number that is zero if all the data are the same and increases as the data become more diverse. Most measures of dispersion have the same units as the quantity being measured. In other words, if the measurements are in metres or seconds, so is the measure of dispersion. Examples of dispersion measures include: * Standard deviation * Interquartile ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Consistent Estimator In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter ''θ''0—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to ''θ''0. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to ''θ''0 converges to one. In practice one constructs an estimator as a function of an available sample of size ''n'', and then imagines being able to keep collecting data and expanding the sample ''ad infinitum''. In this way one would obtain a sequence of estimates indexed by ''n'', and consistency is a property of what occurs as the sample size “grows to infinity”. If the sequence of estimates can be mathematically shown to converge in probability to the true value '' ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Gaussian Distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x) = \frac e^\,. The parameter is the Mean#Mean of a probability distribution, mean or expected value, expectation of the distribution (and also its median and mode (statistics), mode), while the parameter \sigma^2 is the variance. The standard deviation of the distribution is (sigma). A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate. Normal distributions are important in statistics and are often used in the natural science, natural and social sciences to represent real-valued random variables whose distributions are not known. Their importance is partly due to the central limit theorem. It states that, under some conditions, the average of many samples (observations) of a ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Median Absolute Deviation In statistics, the median absolute deviation (MAD) is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample. For a univariate data set ''X''1, ''X''2, ..., ''Xn'', the MAD is defined as the median of the absolute deviations from the data's median \tilde=\operatorname(X) : : \operatorname = \operatorname( , X_i - \tilde, ) that is, starting with the residuals (deviations) from the data's median, the MAD is the median of their absolute values. Example Consider the data (1, 1, 2, 2, 4, 6, 9). It has a median value of 2. The absolute deviations about 2 are (1, 1, 0, 0, 2, 4, 7) which in turn have a median value of 1 (because the sorted absolute deviations are (0, 0, 1, 1, 2, 4, 7)). So the median absolute deviation for this data is 1. Uses The median absolute deviation is a measure of statistical dispersion. Moreover, the MAD is a rob ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Median The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “middle" value. The basic feature of the median in describing data compared to the Arithmetic mean, mean (often simply described as the "average") is that it is not Skewness, skewed by a small proportion of extremely large or small values, and therefore provides a better representation of the center. Median income, for example, may be a better way to describe the center of the income distribution because increases in the largest incomes alone have no effect on the median. For this reason, the median is of central importance in robust statistics. Median is a 2-quantile; it is the value that partitions a set into two equal parts. Finite set of numbers The median of a finite list of numbers is the "middle" number, when those numbers are liste ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Interdecile Range In statistics, the interdecile range is the difference between the first and the ninth deciles (10% and 90%). The interdecile range is a measure of statistical dispersion of the values in a set of data, similar to the range and the interquartile range, and can be computed from the (non-parametric) seven-number summary. Despite its simplicity, the interdecile range of a sample drawn from a normal distribution can be divided by 2.56 to give a reasonably efficient estimator of the standard deviation of a normal distribution. This is derived from the fact that the lower (respectively upper) decile of a normal distribution with arbitrary variance is equal to the mean minus (respectively, plus) 1.28 times the standard deviation. A more efficient estimator is given by instead taking the 7% trimmed range (the difference between the 7th and 93rd percentiles) and dividing by 3 (corresponding to 86% of the data falling within ±1.5 standard deviations of the mean in a normal distributio ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	L-estimator In statistics, an L-estimator (or L-statistic) is an estimator which is a linear combination of order statistics of the measurements. This can be as little as a single point, as in the median (of an odd number of values), or as many as all points, as in the mean. The main benefits of L-estimators are that they are often extremely simple, and often robust statistics: assuming sorted data, they are very easy to calculate and interpret, and are often resistant to outliers. They thus are useful in robust statistics, as descriptive statistics, in statistics education, and when computation is difficult. However, they are inefficient, and in modern times robust statistics M-estimators are preferred, although these are much more difficult computationally. In many circumstances L-estimators are reasonably efficient, and thus adequate for initial estimation. Examples A basic example is the median. Given ''n'' values x_1, \ldots, x_n, if n=2k+1 is odd, the median equals x_, the (n+1)/2 ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Range (statistics) In descriptive statistics, the range of a set of data is size of the narrowest interval which contains all the data. It is calculated as the difference between the largest and smallest values (also known as the sample maximum and minimum). It is expressed in the same units as the data. The range provides an indication of statistical dispersion. Closely related alternative measures are the Interdecile range and the Interquartile range. Range of continuous IID random variables For ''n'' independent and identically distributed continuous random variables ''X''1, ''X''2, ..., ''X''''n'' with the cumulative distribution function G(''x'') and a probability density function g(''x''), let T denote the range of them, that is, T= max(''X''1, ''X''2, ..., ''X''''n'')- min(''X''1, ''X''2, ..., ''X''''n''). Distribution The range, T, has the cumulative distribution function ::F(t)= n \int_^\infty g(x) (x+t)-G(x) \, \textx. Gumbel notes that the "beauty of this formula is com ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Trimmed Estimator In statistics, a trimmed estimator is an estimator derived from another estimator by excluding some of the extreme values, a process called truncation. This is generally done to obtain a more robust statistic, and the extreme values are considered outliers. Trimmed estimators also often have higher efficiency for mixture distributions, and heavy-tailed distributions than the corresponding untrimmed estimator, at the cost of lower efficiency for other distributions, such as the normal distribution. Given an estimator, the x% trimmed version is obtained by discarding the x% lowest or highest observations or on both end: it is a statistic on the ''middle'' of the data. For instance, the 5% trimmed mean is obtained by taking the mean of the 5% to 95% range. In some cases a trimmed estimator discards a fixed number of points (such as maximum and minimum) instead of a percentage. Examples The median is the most trimmed statistic (nominally 50%), as it discards all but the most centra ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Percentile In statistics, a ''k''-th percentile, also known as percentile score or centile, is a score (e.g., a data point) a given percentage ''k'' of all scores in its frequency distribution exists ("exclusive" definition) or a score a given percentage of the all scores exists ("inclusive" definition); i.e. a score in the ''k''-th percentile would be above approximately ''k''% of all scores in its set. For example, the 97th percentile of data is a data point below which 97% of all data points exist (by the exclusive definition). Percentiles depends on how scores are arranged. Percentiles are a type of quantiles, obtained adopting a subdivision into 100 groups. The 25th percentile is also known as the first '' quartile'' (''Q''1), the 50th percentile as the ''median'' or second quartile (''Q''2), and the 75th percentile as the third quartile (''Q''3). For example, the 50th percentile (median) is the score (or , depending on the definition) which 50% of the scores in the distribution are ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Heavy-tailed Distribution In probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded: that is, they have heavier tails than the exponential distribution. Roughly speaking, “heavy-tailed” means the distribution decreases more slowly than an exponential distribution, so extreme values are more likely. In many applications it is the right tail of the distribution that is of interest, but a distribution may have a heavy left tail, or both tails may be heavy. There are three important subclasses of heavy-tailed distributions: the fat-tailed distributions, the long-tailed distributions, and the subexponential distributions. In practice, all commonly used heavy-tailed distributions belong to the subexponential class, introduced by Jozef Teugels. There is still some discrepancy over the use of the term heavy-tailed. There are two other definitions in use. Some authors use the term to refer to those distributions which do not have all their p ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Mixture Distribution In probability and statistics, a mixture distribution is the probability distribution of a random variable that is derived from a collection of other random variables as follows: first, a random variable is selected by chance from the collection according to given probabilities of selection, and then the value of the selected random variable is realized. The underlying random variables may be random real numbers, or they may be random vectors (each having the same dimension), in which case the mixture distribution is a multivariate distribution. In cases where each of the underlying random variables is continuous, the outcome variable will also be continuous and its probability density function is sometimes referred to as a mixture density. The cumulative distribution function (and the probability density function if it exists) can be expressed as a convex combination (i.e. a weighted sum, with non-negative weights that sum to 1) of other distribution functions and density functi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]