HOME

TheInfoList



OR:

In
estimation theory Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value ...
and statistics, the Cramér–Rao bound (CRB) expresses a lower bound on the variance of
unbiased estimator In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In stat ...
s of a deterministic (fixed, though unknown) parameter, the variance of any such estimator is at least as high as the inverse of the
Fisher information In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that mode ...
. Equivalently, it expresses an upper bound on the precision (the inverse of variance) of unbiased estimators: the precision of any such estimator is at most the Fisher information. The result is named in honor of Harald Cramér and
C. R. Rao Calyampudi Radhakrishna Rao FRS (born 10 September 1920), commonly known as C. R. Rao, is an Indian-American mathematician and statistician. He is currently professor emeritus at Pennsylvania State University and Research Professor at the ...
, but has independently also been derived by Maurice Fréchet,
Georges Darmois Georges Darmois (24 June 1888 – 3 January 1960) was a French mathematician and statistician. He pioneered in the theory of sufficiency, in stellar statistics, and in factor analysis. He was also one of the first French mathematicians to teach ...
, as well as
Alexander Aitken Alexander Craig "Alec" Aitken (1 April 1895 – 3 November 1967) was one of New Zealand's most eminent mathematicians. In a 1935 paper he introduced the concept of generalized least squares, along with now standard vector/matrix notation fo ...
and Harold Silverstone. An unbiased estimator that achieves this lower bound is said to be (fully) '' efficient''. Such a solution achieves the lowest possible
mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between ...
among all unbiased methods, and is therefore the minimum variance unbiased (MVU) estimator. However, in some cases, no unbiased technique exists which achieves the bound. This may occur either if for any unbiased estimator, there exists another with a strictly smaller variance, or if an MVU estimator exists, but its variance is strictly greater than the inverse of the Fisher information. The Cramér–Rao bound can also be used to bound the variance of estimators of given bias. In some cases, a biased approach can result in both a variance and a
mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between ...
that are the unbiased Cramér–Rao lower bound; see estimator bias.


Statement

The Cramér–Rao bound is stated in this section for several increasingly general cases, beginning with the case in which the parameter is a
scalar Scalar may refer to: *Scalar (mathematics), an element of a field, which is used to define a vector space, usually the field of real numbers *Scalar (physics), a physical quantity that can be described by a single element of a number field such a ...
and its estimator is
unbiased Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, ...
. All versions of the bound require certain regularity conditions, which hold for most well-behaved distributions. These conditions are listed later in this section.


Scalar unbiased case

Suppose \theta is an unknown deterministic parameter that is to be estimated from n independent observations (measurements) of x, each from a distribution according to some probability density function f(x;\theta). The variance of any ''unbiased'' estimator \hat of \theta is then bounded by the
reciprocal Reciprocal may refer to: In mathematics * Multiplicative inverse, in mathematics, the number 1/''x'', which multiplied by ''x'' gives the product 1, also known as a ''reciprocal'' * Reciprocal polynomial, a polynomial obtained from another pol ...
of the
Fisher information In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that mode ...
I(\theta): :\operatorname(\hat) \geq \frac where the Fisher information I(\theta) is defined by : I(\theta) = n \operatorname_\theta \left \left( \frac \right)^2 \right and \ell(x;\theta)=\log (f(x;\theta)) is the natural logarithm of the likelihood function for a single sample x and \operatorname_\theta denotes the
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
with respect to the density f(x;\theta) of X. If \ell(x;\theta) is twice differentiable and certain regularity conditions hold, then the Fisher information can also be defined as follows: : I(\theta) = -n \operatorname_\theta\left \frac \right The
efficiency Efficiency is the often measurable ability to avoid wasting materials, energy, efforts, money, and time in doing something or in producing a desired result. In a more general sense, it is the ability to do things well, successfully, and without ...
of an unbiased estimator \hat measures how close this estimator's variance comes to this lower bound; estimator efficiency is defined as :e(\hat) = \frac or the minimum possible variance for an unbiased estimator divided by its actual variance. The Cramér–Rao lower bound thus gives :e(\hat) \le 1.


General scalar case

A more general form of the bound can be obtained by considering a biased estimator T(X), whose expectation is not \theta but a function of this parameter, say, \psi(\theta). Hence E\ - \theta = \psi(\theta) - \theta is not generally equal to 0. In this case, the bound is given by : \operatorname(T) \geq \frac where \psi'(\theta) is the derivative of \psi(\theta) (by \theta), and I(\theta) is the Fisher information defined above.


Bound on the variance of biased estimators

Apart from being a bound on estimators of functions of the parameter, this approach can be used to derive a bound on the variance of biased estimators with a given bias, as follows. Consider an estimator \hat with bias b(\theta) = E\ - \theta, and let \psi(\theta) = b(\theta) + \theta. By the result above, any unbiased estimator whose expectation is \psi(\theta) has variance greater than or equal to (\psi'(\theta))^2/I(\theta). Thus, any estimator \hat whose bias is given by a function b(\theta) satisfies : \operatorname \left(\hat\right) \geq \frac. The unbiased version of the bound is a special case of this result, with b(\theta)=0. It's trivial to have a small variance − an "estimator" that is constant has a variance of zero. But from the above equation we find that the
mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between ...
of a biased estimator is bounded by :\operatorname\left((\hat-\theta)^2\right)\geq\frac+b(\theta)^2, using the standard decomposition of the MSE. Note, however, that if 1+b'(\theta)<1 this bound might be less than the unbiased Cramér–Rao bound 1/I(\theta). For instance, in the example of estimating variance below, 1+b'(\theta)= \frac <1.


Multivariate case

Extending the Cramér–Rao bound to multiple parameters, define a parameter column vector :\boldsymbol = \left \theta_1, \theta_2, \dots, \theta_d \rightT \in \mathbb^d with probability density function f(x; \boldsymbol) which satisfies the two regularity conditions below. The
Fisher information matrix In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...
is a d \times d matrix with element I_ defined as : I_ = \operatorname \left \frac \log f\left(x; \boldsymbol\right) \frac \log f\left(x; \boldsymbol\right) \right= -\operatorname \left \frac \log f\left(x; \boldsymbol\right) \right Let \boldsymbol(X) be an estimator of any vector function of parameters, \boldsymbol(X) = (T_1(X), \ldots, T_d(X))^T, and denote its expectation vector \operatorname boldsymbol(X)/math> by \boldsymbol(\boldsymbol). The Cramér–Rao bound then states that the
covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square Matrix (mathematics), matrix giving the covariance between ea ...
of \boldsymbol(X) satisfies : I\left(\boldsymbol\right) \geq \phi(\theta)^T \operatorname_\left(\boldsymbol(X)\right)^\phi(\theta) , : \operatorname_\left(\boldsymbol(X)\right) \geq \phi(\theta) I\left(\boldsymbol\right)^ \phi(\theta)^T where * The matrix inequality A \ge B is understood to mean that the matrix A-B is positive semidefinite, and * \phi(\theta) := \partial \boldsymbol(\boldsymbol)/\partial \boldsymbol is the Jacobian matrix whose ij element is given by \partial \psi_i(\boldsymbol)/\partial \theta_j. If \boldsymbol(X) is an
unbiased Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, ...
estimator of \boldsymbol (i.e., \boldsymbol\left(\boldsymbol\right) = \boldsymbol), then the Cramér–Rao bound reduces to : \operatorname_\left(\boldsymbol(X)\right) \geq I\left(\boldsymbol\right)^. If it is inconvenient to compute the inverse of the
Fisher information matrix In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...
, then one can simply take the reciprocal of the corresponding diagonal element to find a (possibly loose) lower bound. : \operatorname_(T_m(X)) = \left operatorname_\left(\boldsymbol(X)\right)\right \geq \left \left(\boldsymbol\right)^\right \geq \left(\left \left(\boldsymbol\right)\right\right)^.


Regularity conditions

The bound relies on two weak regularity conditions on the probability density function, f(x; \theta), and the estimator T(X): * The Fisher information is always defined; equivalently, for all x such that f(x; \theta) > 0, :: \frac \log f(x;\theta) :exists, and is finite. * The operations of integration with respect to x and differentiation with respect to \theta can be interchanged in the expectation of T; that is, :: \frac \left \int T(x) f(x;\theta) \,dx \right = \int T(x) \left \frac f(x;\theta) \right \,dx :whenever the right-hand side is finite. :This condition can often be confirmed by using the fact that integration and differentiation can be swapped when either of the following cases hold: :# The function f(x;\theta) has bounded support in x, and the bounds do not depend on \theta; :# The function f(x;\theta) has infinite support, is
continuously differentiable In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non-vertical tangent line at each interior point in it ...
, and the integral converges uniformly for all \theta.


Proof


Proof for the general case based on the Chapman–Robbins bound

Proof based on.


A standalone proof for the general scalar case

Assume that T=t(X) is an estimator with expectation \psi(\theta) (based on the observations X), i.e. that \operatorname(T) = \psi (\theta). The goal is to prove that, for all \theta, :\operatorname(t(X)) \geq \frac. Let X be a random variable with probability density function f(x; \theta). Here T = t(X) is a
statistic A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hyp ...
, which is used as an
estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
for \psi (\theta). Define V as the
score Score or scorer may refer to: *Test score, the result of an exam or test Business * Score Digital, now part of Bauer Radio * Score Entertainment, a former American trading card design and manufacturing company * Score Media, a former Canadian ...
: :V = \frac \ln f(X;\theta) = \frac\fracf(X;\theta) where the
chain rule In calculus, the chain rule is a formula that expresses the derivative of the composition of two differentiable functions and in terms of the derivatives of and . More precisely, if h=f\circ g is the function such that h(x)=f(g(x)) for every , ...
is used in the final equality above. Then the expectation of V, written \operatorname(V), is zero. This is because: : \operatorname(V) = \int f(x;\theta)\left frac\frac f(x;\theta)\right\, dx = \frac\int f(x;\theta) \, dx = 0 where the integral and partial derivative have been interchanged (justified by the second regularity condition). If we consider the
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the les ...
\operatorname(V, T) of V and T, we have \operatorname(V, T) = \operatorname(V T), because \operatorname(V) = 0. Expanding this expression we have : \begin \operatorname(V,T) & = \operatorname \left( T \cdot\left frac\fracf(X;\theta) \right\right) \\ pt& = \int t(x) \left frac \frac f(x;\theta) \rightf(x;\theta)\, dx \\ pt& = \frac \left \int t(x) f(x;\theta)\,dx \right= \frac E(T) = \psi^\prime(\theta) \end again because the integration and differentiation operations commute (second condition). The
Cauchy–Schwarz inequality The Cauchy–Schwarz inequality (also called Cauchy–Bunyakovsky–Schwarz inequality) is considered one of the most important and widely used inequalities in mathematics. The inequality for sums was published by . The corresponding inequality fo ...
shows that : \sqrt \geq \left, \operatorname(V,T) \ = \left , \psi^\prime (\theta) \right , therefore : \operatorname (T) \geq \frac = \frac which proves the proposition.


Examples


Multivariate normal distribution

For the case of a ''d''-variate normal distribution : \boldsymbol \sim N_d \left( \boldsymbol( \boldsymbol) , ( \boldsymbol) \right) the
Fisher information matrix In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...
has elements : I_ = \frac ^ \frac + \frac \operatorname \left( ^ \frac ^ \frac \right) where "tr" is the trace. For example, let w /math> be a sample of N independent observations with unknown mean \theta and known variance \sigma^2 . :w \sim \mathbb_N \left(\theta , \sigma^2 \right). Then the Fisher information is a scalar given by : I(\theta) = \left(\frac\right)^T^ \left(\frac\right) = \sum^N_\frac = \frac, and so the Cramér–Rao bound is : \operatorname\left(\hat \theta\right) \geq \frac.


Normal variance with known mean

Suppose ''X'' is a normally distributed random variable with known mean \mu and unknown variance \sigma^2. Consider the following statistic: : T=\frac. Then ''T'' is unbiased for \sigma^2, as E(T)=\sigma^2. What is the variance of ''T''? : \operatorname(T) = \operatorname\left(\frac\right)=\frac=\frac \left \operatorname\left\-\left(\operatorname\\right)^2 \right (the second equality follows directly from the definition of variance). The first term is the fourth moment about the mean and has value 3(\sigma^2)^2; the second is the square of the variance, or (\sigma^2)^2. Thus :\operatorname(T)=\frac. Now, what is the
Fisher information In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that mode ...
in the sample? Recall that the
score Score or scorer may refer to: *Test score, the result of an exam or test Business * Score Digital, now part of Bauer Radio * Score Entertainment, a former American trading card design and manufacturing company * Score Media, a former Canadian ...
V is defined as : V=\frac\log\left L(\sigma^2,X)\right where L is the likelihood function. Thus in this case, : \log\left (\sigma^2,X)\right\log\left frace^\right=-\log(\sqrt)-\frac : V=\frac\log \left L(\sigma^2,X) \right\frac\left \log(\sqrt)-\frac\right=-\frac+\frac where the second equality is from elementary calculus. Thus, the information in a single observation is just minus the expectation of the derivative of V, or : I =-\operatorname\left(\frac\right) =-\operatorname\left(-\frac+\frac\right) =\frac-\frac =\frac. Thus the information in a sample of n independent observations is just n times this, or \frac. The Cramér–Rao bound states that : \operatorname(T)\geq\frac. In this case, the inequality is saturated (equality is achieved), showing that the
estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
is efficient. However, we can achieve a lower
mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between ...
using a biased estimator. The estimator : T=\frac. obviously has a smaller variance, which is in fact :\operatorname(T)=\frac. Its bias is :\left(1-\frac\right)\sigma^2=\frac so its mean squared error is :\operatorname(T)=\left(\frac+\frac\right)(\sigma^2)^2 =\frac which is clearly less than what unbiased estimators can achieve according to the Cramér–Rao bound. When the mean is not known, the minimum mean squared error estimate of the variance of a sample from Gaussian distribution is achieved by dividing by n+1, rather than n-1 or n+2.


See also

* Chapman–Robbins bound * Kullback's inequality *
Brascamp–Lieb inequality In mathematics, the Brascamp–Lieb inequality is either of two inequalities. The first is a result in geometry concerning integrable functions on ''n''-dimensional Euclidean space \mathbb^. It generalizes the Loomis–Whitney inequality and Höl ...


References and notes


Further reading

* * * . Chapter 3. * . Section 3.1.3.


External links


FandPLimitTool
a GUI-based software to calculate the Fisher information and Cramér-Rao lower bound with application to single-molecule microscopy. {{DEFAULTSORT:Cramer-Rao bound Articles containing proofs Statistical inequalities Estimation theory