statistics Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industri ...

, the concept of being an invariant estimator is a criterion that can be used to compare the properties of different

estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...

s for the same quantity. It is a way of formalising the idea that an estimator should have certain intuitively appealing qualities. Strictly speaking, "invariant" would mean that the estimates themselves are unchanged when both the measurements and the parameters are transformed in a compatible way, but the meaning has been extended to allow the estimates to change in appropriate ways with such transformations. The term equivariant estimator is used in formal mathematical contexts that include a precise description of the relation of the way the estimator changes in response to changes to the dataset and parameterisation: this corresponds to the use of " equivariance" in more general mathematics.

General setting

Background

statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properti ...

, there are several approaches to

estimation theory Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value ...

that can be used to decide immediately what estimators should be used according to those approaches. For example, ideas from

Bayesian inference Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and ...

would lead directly to

Bayesian estimator In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function (i.e., the posterior expected loss). Equivalently, it maximizes the pos ...

s. Similarly, the theory of classical statistical inference can sometimes lead to strong conclusions about what estimator should be used. However, the usefulness of these theories depends on having a fully prescribed

statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form, ...

and may also depend on having a relevant loss function to determine the estimator. Thus a

Bayesian analysis Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and e ...

might be undertaken, leading to a posterior distribution for relevant parameters, but the use of a specific utility or loss function may be unclear. Ideas of invariance can then be applied to the task of summarising the posterior distribution. In other cases, statistical analyses are undertaken without a fully defined statistical model or the classical theory of statistical inference cannot be readily applied because the family of models being considered are not amenable to such treatment. In addition to these cases where general theory does not prescribe an estimator, the concept of invariance of an estimator can be applied when seeking estimators of alternative forms, either for the sake of simplicity of application of the estimator or so that the estimator is

robust Robustness is the property of being strong and healthy in constitution. When it is transposed into a system, it refers to the ability of tolerating perturbations that might affect the system’s functional body. In the same line ''robustness'' ca ...

. The concept of invariance is sometimes used on its own as a way of choosing between estimators, but this is not necessarily definitive. For example, a requirement of invariance may be incompatible with the requirement that the estimator be mean-unbiased; on the other hand, the criterion of median-unbiasedness is defined in terms of the estimator's sampling distribution and so is invariant under many transformations. One use of the concept of invariance is where a class or family of estimators is proposed and a particular formulation must be selected amongst these. One procedure is to impose relevant invariance properties and then to find the formulation within this class that has the best properties, leading to what is called the optimal invariant estimator.

Some classes of invariant estimators

There are several types of transformations that are usefully considered when dealing with invariant estimators. Each gives rise to a class of estimators which are invariant to those particular types of transformation. *Shift invariance: Notionally, estimates of a

location parameter In geography, location or place are used to denote a region (point, line, or area) on Earth's surface or elsewhere. The term ''location'' generally implies a higher degree of certainty than ''place'', the latter often indicating an entity with an ...

should be invariant to simple shifts of the data values. If all data values are increased by a given amount, the estimate should change by the same amount. When considering estimation using a

weighted average The weighted arithmetic mean is similar to an ordinary arithmetic mean (the most common type of average), except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The ...

, this invariance requirement immediately implies that the weights should sum to one. While the same result is often derived from a requirement for unbiasedness, the use of "invariance" does not require that a mean value exists and makes no use of any probability distribution at all. *Scale invariance: Note that this topic about the invariance of the estimator scale parameter not to be confused with the more general

scale invariance In physics, mathematics and statistics, scale invariance is a feature of objects or laws that do not change if scales of length, energy, or other variables, are multiplied by a common factor, and thus represent a universality. The technical ter ...

about the behavior of systems under aggregate properties (in physics). *Parameter-transformation invariance: Here, the transformation applies to the parameters alone. The concept here is that essentially the same inference should be made from data and a model involving a parameter θ as would be made from the same data if the model used a parameter φ, where φ is a one-to-one transformation of θ, φ=''h''(θ). According to this type of invariance, results from transformation-invariant estimators should also be related by φ=''h''(θ).

Maximum likelihood estimator In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statis ...

s have this property when the transformation is

monotonic In mathematics, a monotonic function (or monotone function) is a function between ordered sets that preserves or reverses the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of ord ...

. Though the asymptotic properties of the estimator might be invariant, the small sample properties can be different, and a specific distribution needs to be derived.Gouriéroux and Monfort (1995) *Permutation invariance: Where a set of data values can be represented by a statistical model that they are outcomes from

independent and identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usual ...

random variables A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...

, it is reasonable to impose the requirement that any estimator of any property of the common distribution should be permutation-invariant: specifically that the estimator, considered as a function of the set of data-values, should not change if items of data are swapped within the dataset. The combination of permutation invariance and location invariance for estimating a location parameter from an

dataset using a weighted average implies that the weights should be identical and sum to one. Of course, estimators other than a weighted average may be preferable.

Optimal invariant estimators

Under this setting, we are given a set of measurements

x

which contains information about an unknown parameter

\theta

. The measurements

x

are modelled as a vector random variable having a

probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...

f(x, \theta)

which depends on a parameter vector

\theta

. The problem is to estimate

\theta

given

x

. The estimate, denoted by

a

, is a function of the measurements and belongs to a set

A

. The quality of the result is defined by a

loss function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cos ...

L=L(a,\theta)

which determines a

risk function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...

R=R(a,\theta)=E \theta /math>. The sets of possible values of x, \theta, and a are denoted by X, \Theta, and A, respectively.

In classification

statistical classification In statistics, classification is the problem of identifying which of a set of categories (sub-populations) an observation (or observations) belongs to. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diagno ...

, the rule which assigns a class to a new data-item can be considered to be a special type of estimator. A number of invariance-type considerations can be brought to bear in formulating

prior knowledge for pattern recognition Pattern recognition is a very active field of research intimately bound to machine learning. Also known as classification or statistical classification, pattern recognition aims at building a classifier that can determine the class of an input patt ...

Mathematical setting

Definition

An invariant estimator is an estimator which obeys the following two rules: # Principle of Rational Invariance: The action taken in a decision problem should not depend on transformation on the measurement used # Invariance Principle: If two decision problems have the same formal structure (in terms of

X

\Theta

f(x, \theta)

and

L

), then the same decision rule should be used in each problem. To define an invariant or equivariant estimator formally, some definitions related to groups of transformations are needed first. Let

X

denote the set of possible data-samples. A group of transformations of

X

, to be denoted by

G

, is a set of (measurable) 1:1 and onto transformations of

X

into itself, which satisfies the following conditions: # If

g_1\in G

and

g_2\in G

then

g_1 g_2\in G \,

# If

g\in G

then

g^\in G

, where

g^(g(x))=x \, .

(That is, each transformation has an inverse within the group.) #

e\in G

(i.e. there is an identity transformation

e(x)=x \,

) Datasets

x_1

and

x_2

X

are equivalent if

x_1=g(x_2)

for some

g\in G

. All the equivalent points form an

equivalence class In mathematics, when the elements of some set S have a notion of equivalence (formalized as an equivalence relation), then one may naturally split the set S into equivalence classes. These equivalence classes are constructed so that elements ...

. Such an equivalence class is called an

orbit In celestial mechanics, an orbit is the curved trajectory of an object such as the trajectory of a planet around a star, or of a natural satellite around a planet, or of an artificial satellite around an object or position in space such as a ...

(in

X

). The

x_0

orbit,

X(x_0)

, is the set

X(x_0)=\

. If

X

consists of a single orbit then

g

is said to be transitive. A family of densities

F

is said to be invariant under the group

G

if, for every

g\in G

and

\theta\in \Theta

there exists a unique

\theta^*\in  \Theta

such that

Y=g(x)

has density

f(y, \theta^*)

\theta^*

will be denoted

\bar(\theta)

. If

F

is invariant under the group

G

then the loss function

L(\theta,a)

is said to be invariant under

G

if for every

g\in G

and

a\in A

there exists an

a^*\in A

such that

L(\theta,a)=L(\bar(\theta),a^*)

for all

\theta \in \Theta

. The transformed value

a^*

will be denoted by

\tilde(a)

. In the above,

\bar=\

is a group of transformations from

\Theta

to itself and

\tilde=\

is a group of transformations from

A

to itself. An estimation problem is invariant(equivariant) under

G

if there exist three groups

G, \bar, \tilde

as defined above. For an estimation problem that is invariant under

G

, estimator

\delta(x)

is an invariant estimator under

G

if, for all

x\in X

and

g\in G

, :

\delta(g(x)) = \tilde(\delta(x)).

Properties

# The risk function of an invariant estimator,

\delta

, is constant on orbits of

\Theta

. Equivalently

R(\theta,\delta)=R(\bar(\theta),\delta)

for all

\theta \in \Theta

and

\bar\in \bar

. # The risk function of an invariant estimator with transitive

\bar

is constant. For a given problem, the invariant estimator with the lowest risk is termed the "best invariant estimator". Best invariant estimator cannot always be achieved. A special case for which it can be achieved is the case when

\bar

is transitive.

Example: Location parameter

Suppose

\theta

is a location parameter if the density of

X

is of the form

f(x-\theta)

. For

\Theta=A=\mathbb^1

and

L=L(a-\theta)

, the problem is invariant under

g=\bar=\tilde=\

. The invariant estimator in this case must satisfy :

\delta(x+c)=\delta(x)+c, \text c\in \mathbb,

thus it is of the form

\delta(x)=x+K

(

K\in \mathbb

\bar

is transitive on

\Theta

so the risk does not vary with

\theta

: that is,

R(\theta,\delta)=R(0,\delta)=\operatorname \theta=0 /math>. The best invariant estimator is the one that brings the risk R(\theta,\delta) to minimum.

In the case that L is the squared error \delta(x)=x-\operatorname \theta=0

Pitman estimator

The estimation problem is that

X=(X_1,\dots,X_n)

has density

f(x_1-\theta,\dots,x_n-\theta)

, where ''θ'' is a parameter to be estimated, and where the

L(, a-\theta, )

. This problem is invariant with the following (additive) transformation groups: :

G=\,

\bar=\,

\tilde=\ .

The best invariant estimator

\delta(x)

is the one that minimizes :

\frac,

and this is Pitman's estimator (1939). For the squared error loss case, the result is :

\delta(x)=\frac.

x \sim N(\theta 1_n,I)\,\!

(i.e. a

multivariate normal distribution In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One d ...

with independent, unit-variance components) then :

\delta_ = \delta_=\frac.

x \sim C(\theta 1_n,I \sigma^2)\,\!

(independent components having a

Cauchy distribution The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) func ...

with scale parameter ''σ'') then

\delta_ \ne \delta_

,. However the result is :

\delta_=\sum_^n, \qquad n>1,

with :

w_k = \prod_\left frac\right left -\fraci\right

References

* * * * {{DEFAULTSORT:Invariant Estimator Estimator Invariant theory