statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, a consistent estimator or asymptotically consistent estimator is an

estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on Sample (statistics), observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguish ...

—a rule for computing estimates of a parameter ''θ''₀—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates

converges in probability In probability theory, there exist several different notions of convergence of sequences of random variables, including ''convergence in probability'', ''convergence in distribution'', and ''almost sure convergence''. The different notions of conve ...

to ''θ''₀. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to ''θ''₀ converges to one. In practice one constructs an estimator as a function of an available sample of

size Size in general is the Magnitude (mathematics), magnitude or dimensions of a thing. More specifically, ''geometrical size'' (or ''spatial size'') can refer to three geometrical measures: length, area, or volume. Length can be generalized ...

''n'', and then imagines being able to keep collecting data and expanding the sample ''ad infinitum''. In this way one would obtain a sequence of estimates indexed by ''n'', and consistency is a property of what occurs as the sample size “grows to infinity”. If the sequence of estimates can be mathematically shown to converge in probability to the true value ''θ''₀, it is called a consistent estimator; otherwise the estimator is said to be inconsistent. Consistency as defined here is sometimes referred to as weak consistency. When we replace convergence in probability with

almost sure convergence In probability theory, there exist several different notions of convergence of sequences of random variables, including ''convergence in probability'', ''convergence in distribution'', and ''almost sure convergence''. The different notions of conve ...

, then the estimator is said to be strongly consistent. Consistency is related to

bias Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is inaccurate, closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individ ...

; see bias versus consistency.

Definition

Formally speaking, an

''T_n'' of parameter ''θ'' is said to be weakly consistent, if it converges in probability to the true value of the parameter: :

\underset\;T_n = \theta.

i.e. if, for all ''ε'' > 0 :

\lim_\Pr\big(, T_n-\theta,  > \varepsilon\big) = 0.

''T_n'' of parameter ''θ'' is said to be strongly consistent, if it converges almost surely to the true value of the parameter: :

\Pr\big(\lim_T_n = \theta\big) = 1.

A more rigorous definition takes into account the fact that ''θ'' is actually unknown, and thus, the convergence in probability must take place for every possible value of this parameter. Suppose is a family of distributions (the

parametric model In statistics, a parametric model or parametric family or finite-dimensional model is a particular class of statistical models. Specifically, a parametric model is a family of probability distributions that has a finite number of parameters. Defi ...

), and is an infinite sample from the distribution ''p_θ''. Let be a sequence of estimators for some parameter ''g''(''θ''). Usually, ''T_n'' will be based on the first ''n'' observations of a sample. Then this sequence is said to be (weakly) consistent if :

\underset\;T_n(X^) = g(\theta),\ \ \text\ \theta\in\Theta.

This definition uses ''g''(''θ'') instead of simply ''θ'', because often one is interested in estimating a certain function or a sub-vector of the underlying parameter. In the next example, we estimate the location parameter of the model, but not the scale:

Examples

Sample mean of a normal random variable

Suppose one has a sequence of

statistically independent Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two event (probability theory), events are independent, statistically independent, or stochastically independent if, informally s ...

observations from a normal ''N''(''μ'', ''σ''²) distribution. To estimate ''μ'' based on the first ''n'' observations, one can use the

sample mean The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or me ...

: ''T_n'' = (''X''₁ + ... + ''X_n'')/''n''. This defines a sequence of estimators, indexed by the sample size ''n''. From the properties of the normal distribution, we know the

sampling distribution In statistics, a sampling distribution or finite-sample distribution is the probability distribution of a given random-sample-based statistic. For an arbitrarily large number of samples where each sample, involving multiple observations (data poi ...

of this statistic: ''T''_''n'' is itself normally distributed, with mean ''μ'' and variance ''σ''²/''n''. Equivalently,

\scriptstyle (T_n-\mu)/(\sigma/\sqrt)

has a standard normal distribution: :

= 2\left(1-\Phi\left(\frac\right)\right) \to 0

as ''n'' tends to infinity, for any fixed . Therefore, the sequence ''T_n'' of sample means is consistent for the population mean ''μ'' (recalling that

\Phi

is the cumulative distribution of the standard normal distribution).

Establishing consistency

The notion of asymptotic consistency is very close, almost synonymous to the notion of convergence in probability. As such, any theorem, lemma, or property which establishes convergence in probability may be used to prove the consistency. Many such tools exist: * In order to demonstrate consistency directly from the definition one can use the inequality ::

\leq \frac,

the most common choice for function ''h'' being either the absolute value (in which case it is known as

Markov inequality In probability theory, Markov's inequality gives an upper bound on the probability that a non-negative random variable is greater than or equal to some positive constant. Markov's inequality is tight in the sense that for each chosen positive con ...

), or the quadratic function (respectively

Chebyshev's inequality In probability theory, Chebyshev's inequality (also called the Bienaymé–Chebyshev inequality) provides an upper bound on the probability of deviation of a random variable (with finite variance) from its mean. More specifically, the probability ...

). * Another useful result is the

continuous mapping theorem In probability theory, the continuous mapping theorem states that continuous functions preserve limits even if their arguments are sequences of random variables. A continuous function, in Heine's definition, is such a function that maps converge ...

: if ''T_n'' is consistent for ''θ'' and ''g''(·) is a real-valued function continuous at the point ''θ'', then ''g''(''T_n'') will be consistent for ''g''(''θ''): ::

T_n\ \xrightarrow\ \theta\ \quad\Rightarrow\quad g(T_n)\ \xrightarrow\ g(\theta)

Slutsky's theorem In probability theory, Slutsky's theorem extends some properties of algebraic operations on convergent sequences of real numbers to sequences of random variables. The theorem was named after Eugen Slutsky. Slutsky's theorem is also attributed to ...

can be used to combine several different estimators, or an estimator with a non-random convergent sequence. If ''T_n'' →^''d''''α'', and ''S_n'' →^''p''''β'', then ::

\begin
  & T_n + S_n \ \xrightarrow\ \alpha+\beta, \\
  & T_n   S_n \ \xrightarrow\ \alpha \beta, \\
  & T_n / S_n \ \xrightarrow\ \alpha/\beta, \text\beta\neq0
  \end

* If estimator ''T_n'' is given by an explicit formula, then most likely the formula will employ sums of random variables, and then the

law of large numbers In probability theory, the law of large numbers is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the law o ...

can be used: for a sequence of random variables and under suitable conditions, ::

\frac\sum_^n g(X_i) \ \xrightarrow\ \operatorname,g(X)\, /math>

* If estimator ''T

_n'' is defined implicitly, for example as a value that maximizes certain objective function (see

extremum estimator In mathematical analysis, the maximum and minimum of a function are, respectively, the greatest and least value taken by the function. Known generically as extremum, they may be defined either within a given range (the ''local'' or ''relative ...

), then a more complicated argument involving

stochastic equicontinuity In estimation theory in statistics, stochastic equicontinuity is a property of estimators (estimation procedures) that is useful in dealing with their asymptotic behaviour as the amount of data increases. It is a version of equicontinuity used in ...

has to be used.

Bias versus consistency

Unbiased but not consistent

An estimator can be

unbiased Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is inaccurate, closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individ ...

but not consistent. For example, for an iid sample one can use ''T''(''X'') = ''x'' as the estimator of the mean E 'X'' Note that here the sampling distribution of ''T'' is the same as the underlying distribution (for any ''n,'' as it ignores all points but the last). So E 'T''(''X'')= E 'X''for any ''n,'' hence it is unbiased, but it does not converge to any value. However, if a sequence of estimators is unbiased ''and'' converges to a value, then it is consistent, as it must converge to the correct value.

Biased but consistent

Alternatively, an estimator can be biased but consistent. For example, if the mean is estimated by

\sum x_i +

it is biased, but as

n \rightarrow \infty

, it approaches the correct value, and so it is consistent. Important examples include the

sample variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, ...

and

sample standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its mean. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the ...

. Without

Bessel's correction In statistics, Bessel's correction is the use of ''n'' − 1 instead of ''n'' in the formula for the sample variance and sample standard deviation, where ''n'' is the number of observations in a sample. This method corrects the bias in ...

(that is, when using the sample size

n

instead of the

degrees of freedom In many scientific fields, the degrees of freedom of a system is the number of parameters of the system that may vary independently. For example, a point in the plane has two degrees of freedom for translation: its two coordinates; a non-infinite ...

n-1

), these are both negatively biased but consistent estimators. With the correction, the corrected sample variance is unbiased, while the corrected sample standard deviation is still biased, but less so, and both are still consistent: the correction factor converges to 1 as sample size grows. Here is another example. Let

T_n

be a sequence of estimators for

\theta

. :

\Pr(T_n) = \begin
  1 - 1/n, & \mbox\, T_n = \theta \\
  1/n,  & \mbox\, T_n = n\delta + \theta
\end

We can see that

T_n \xrightarrow \theta

= \theta + \delta

, and the bias does not converge to zero.

Notes

References

* * * * *.

External links

* by Mark Thoma {{DEFAULTSORT:Consistent estimator Estimator Asymptotic theory (statistics)