statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, an estimator is a rule for calculating an

estimate Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is de ...

of a given

quantity Quantity or amount is a property that can exist as a multitude or magnitude, which illustrate discontinuity and continuity. Quantities can be compared in terms of "more", "less", or "equal", or by assigning a numerical value multiple of a u ...

based on observed data: thus the rule (the estimator), the quantity of interest (the

estimand An estimand is a quantity that is to be estimated in a statistical analysis. The term is used to distinguish the target of inference from the method used to obtain an approximation of this target (i.e., the estimator) and the specific value obtain ...

) and its result (the estimate) are distinguished. For example, the

sample mean The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or me ...

is a commonly used estimator of the

population mean In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hyp ...

. There are point and interval estimators. The

point estimator In statistics, point estimation involves the use of sample data to calculate a single value (known as a point estimate since it identifies a point in some parameter space) which is to serve as a "best guess" or "best estimate" of an unknown popu ...

s yield single-valued results. This is in contrast to an interval estimator, where the result would be a range of plausible values. "Single value" does not necessarily mean "single number", but includes vector valued or function valued estimators. ''

Estimation theory Estimation theory is a branch of statistics that deals with estimating the values of Statistical parameter, parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such ...

'' is concerned with the properties of estimators; that is, with defining properties that can be used to compare different estimators (different rules for creating estimates) for the same quantity, based on the same data. Such properties can be used to determine the best rules to use under given circumstances. However, in

robust statistics Robust statistics are statistics that maintain their properties even if the underlying distributional assumptions are incorrect. Robust Statistics, statistical methods have been developed for many common problems, such as estimating location parame ...

, statistical theory goes on to consider the balance between having good properties, if tightly defined assumptions hold, and having worse properties that hold under wider conditions.

Background

An "estimator" or "

point estimate In statistics, point estimation involves the use of sample data to calculate a single value (known as a point estimate since it identifies a point in some parameter space) which is to serve as a "best guess" or "best estimate" of an unknown popu ...

" is a

statistic A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypot ...

(that is, a function of the data) that is used to infer the value of an unknown

parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...

in a

statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...

. A common way of phrasing it is "the estimator is the method selected to obtain an estimate of an unknown parameter". The parameter being estimated is sometimes called the ''

''. It can be either finite-dimensional (in parametric and semi-parametric models), or infinite-dimensional ( semi-parametric and non-parametric models). If the parameter is denoted

\theta

then the estimator is traditionally written by adding a

circumflex The circumflex () is a diacritic in the Latin and Greek scripts that is also used in the written forms of many languages and in various romanization and transcription schemes. It received its English name from "bent around"a translation of ...

over the symbol:

\widehat

. Being a function of the data, the estimator is itself a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

; a particular realization of this random variable is called the "estimate". Sometimes the words "estimator" and "estimate" are used interchangeably. The definition places virtually no restrictions on which functions of the data can be called the "estimators". The attractiveness of different estimators can be judged by looking at their properties, such as

unbiasedness In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In stat ...

mean square error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between ...

consistency In deductive logic, a consistent theory is one that does not lead to a logical contradiction. A theory T is consistent if there is no formula \varphi such that both \varphi and its negation \lnot\varphi are elements of the set of consequences ...

asymptotic distribution In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the limiting distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing appr ...

, etc. The construction and comparison of estimators are the subjects of the

estimation theory Estimation theory is a branch of statistics that deals with estimating the values of Statistical parameter, parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such ...

. In the context of

decision theory Decision theory or the theory of rational choice is a branch of probability theory, probability, economics, and analytic philosophy that uses expected utility and probabilities, probability to model how individuals would behave Rationality, ratio ...

, an estimator is a type of

decision rule In decision theory, a decision rule is a function which maps an observation to an appropriate action. Decision rules play an important role in the theory of statistics and economics, and are closely related to the concept of a strategy in game ...

, and its performance may be evaluated through the use of

loss function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...

s. When the word "estimator" is used without a qualifier, it usually refers to point estimation. The estimate in this case is a single point in the

parameter space The parameter space is the space of all possible parameter values that define a particular mathematical model. It is also sometimes called weight space, and is often a subset of finite-dimensional Euclidean space. In statistics, parameter spaces a ...

. There also exists another type of estimator: interval estimators, where the estimates are subsets of the parameter space. The problem of

density estimation In statistics, probability density estimation or simply density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function. The unobservable density function is thought o ...

arises in two applications. Firstly, in estimating the

probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...

s of random variables and secondly in estimating the spectral density function of a

time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...

. In these problems the estimates are functions that can be thought of as point estimates in an infinite dimensional space, and there are corresponding interval estimation problems.

Definition

Suppose a fixed ''parameter''

\theta

needs to be estimated. Then an "estimator" is a function that maps the

sample space In probability theory, the sample space (also called sample description space, possibility space, or outcome space) of an experiment or random trial is the set of all possible outcomes or results of that experiment. A sample space is usually den ...

to a set of ''sample estimates''. An estimator of

\theta

is usually denoted by the symbol

\widehat

. It is often convenient to express the theory using the

algebra of random variables In statistics, the algebra of random variables provides rules for the symbolic manipulation of random variables, while avoiding delving too deeply into the mathematically sophisticated ideas of probability theory. Its symbolism allows the treat ...

: thus if ''X'' is used to denote a

corresponding to the observed data, the estimator (itself treated as a random variable) is symbolised as a function of that random variable,

\widehat(X)

. The estimate for a particular observed data value

x

(i.e. for

X=x

) is then

\widehat(x)

, which is a fixed value. Often an abbreviated notation is used in which

\widehat

is interpreted directly as a

, but this can cause confusion.

Quantified properties

The following definitions and attributes are relevant.

Error

For a given sample

x

, the "

error An error (from the Latin , meaning 'to wander'Oxford English Dictionary, s.v. “error (n.), Etymology,” September 2023, .) is an inaccurate or incorrect action, thought, or judgement. In statistics, "error" refers to the difference between t ...

" of the estimator

\widehat

is defined as :

e(x)=\widehat(x) - \theta,

where

\theta

is the parameter being estimated. The error, ''e'', depends not only on the estimator (the estimation formula or procedure), but also on the sample.

Mean squared error

The

mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwee ...

\widehat

is defined as the expected value (probability-weighted average, over all samples) of the squared errors; that is, :

\operatorname(\widehat) = \operatorname \widehat(X) - \theta)^2

It is used to indicate how far, on average, the collection of estimates are from the single parameter being estimated. Consider the following analogy. Suppose the parameter is the bull's-eye of a target, the estimator is the process of shooting arrows at the target, and the individual arrows are estimates (samples). Then high MSE means the average distance of the arrows from the bull's eye is high, and low MSE means the average distance from the bull's eye is low. The arrows may or may not be clustered. For example, even if all arrows hit the same point, yet grossly miss the target, the MSE is still relatively large. However, if the MSE is relatively low then the arrows are likely more highly clustered (than highly dispersed) around the target.

Sampling deviation

For a given sample

x

, the ''sampling deviation'' of the estimator

\widehat

is defined as :

d(x) =\widehat(x) - \operatorname( \widehat(X) ) =\widehat(x) - \operatorname( \widehat ),

where

\operatorname( \widehat(X) )

is the

expected value In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...

of the estimator. The sampling deviation, ''d'', depends not only on the estimator, but also on the sample.

Variance

The

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

\widehat

is the expected value of the squared sampling deviations; that is,

\operatorname(\widehat) = \operatorname \widehat - \operatorname[\widehat^2">widehat.html" ;"title="\widehat - \operatorname[\widehat">\widehat - \operatorname[\widehat^2/math>. It is used to indicate how far, on average, the collection of estimates are from the ''expected value'' of the estimates. (Note the difference between MSE and variance.) If the parameter is the bull's-eye of a target, and the arrows are estimates, then a relatively high variance means the arrows are dispersed, and a relatively low variance means the arrows are clustered. Even if the variance is low, the cluster of arrows may still be far off-target, and even if the variance is high, the diffuse collection of arrows may still be unbiased. Finally, even if all arrows grossly miss the target, if they nevertheless all hit the same point, the variance is zero.

Bias

The bias of an estimator">bias Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is inaccurate, closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individ ...