HOME

TheInfoList



OR:

In
estimation theory Estimation theory is a branch of statistics that deals with estimating the values of Statistical parameter, parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such ...
and
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, the Cramér–Rao bound (CRB) relates to estimation of a deterministic (fixed, though unknown) parameter. The result is named in honor of Harald Cramér and Calyampudi Radhakrishna Rao, but has also been derived independently by Maurice Fréchet, Georges Darmois, and by Alexander Aitken and Harold Silverstone. It is also known as Fréchet-Cramér–Rao or Fréchet-Darmois-Cramér-Rao lower bound. It states that the precision of any
unbiased estimator In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In stat ...
is at most the Fisher information; or (equivalently) the reciprocal of the Fisher information is a lower bound on its
variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
. An unbiased estimator that achieves this bound is said to be (fully) '' efficient''. Such a solution achieves the lowest possible
mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwee ...
among all unbiased methods, and is, therefore, the minimum variance unbiased (MVU) estimator. However, in some cases, no unbiased technique exists which achieves the bound. This may occur either if for any unbiased estimator, there exists another with a strictly smaller variance, or if an MVU estimator exists, but its variance is strictly greater than the inverse of the Fisher information. The Cramér–Rao bound can also be used to bound the variance of estimators of given bias. In some cases, a biased approach can result in both a variance and a
mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwee ...
that are the unbiased Cramér–Rao lower bound; see estimator bias. Significant progress over the Cramér–Rao lower bound was proposed by Anil Kumar Bhattacharyya through a series of works, called Bhattacharyya bound.


Statement

The Cramér–Rao bound is stated in this section for several increasingly general cases, beginning with the case in which the parameter is a scalar and its estimator is
unbiased Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is inaccurate, closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individ ...
. All versions of the bound require certain regularity conditions, which hold for most well-behaved distributions. These conditions are listed later in this section.


Scalar unbiased case

Suppose \theta is an unknown deterministic parameter that is to be estimated from n independent observations (measurements) of x, each from a distribution according to some
probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
f(x;\theta). The
variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
of any ''unbiased'' estimator \hat of \theta is then bounded by the reciprocal of the Fisher information I(\theta): :\operatorname(\hat) \geq \frac where the Fisher information I(\theta) is defined by : I(\theta) = n \operatorname_ \left \left( \frac \right)^2 \right and \ell(x;\theta)=\log (f(x;\theta)) is the
natural logarithm The natural logarithm of a number is its logarithm to the base of a logarithm, base of the e (mathematical constant), mathematical constant , which is an Irrational number, irrational and Transcendental number, transcendental number approxima ...
of the
likelihood function A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...
for a single sample x and \operatorname_ denotes the
expected value In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...
with respect to the density f(x;\theta) of X. If not indicated, in what follows, the expectation is taken with respect to X. If \ell(x;\theta) is twice differentiable and certain regularity conditions hold, then the Fisher information can also be defined as follows: : I(\theta) = -n \operatorname_\left \frac \right The
efficiency Efficiency is the often measurable ability to avoid making mistakes or wasting materials, energy, efforts, money, and time while performing a task. In a more general sense, it is the ability to do things well, successfully, and without waste. ...
of an unbiased estimator \hat measures how close this estimator's variance comes to this lower bound; estimator efficiency is defined as :e(\hat) = \frac or the minimum possible variance for an unbiased estimator divided by its actual variance. The Cramér–Rao lower bound thus gives :e(\hat) \le 1.


General scalar case

A more general form of the bound can be obtained by considering a biased estimator T(X), whose expectation is not \theta but a function of this parameter, say, \psi(\theta). Hence E\ - \theta = \psi(\theta) - \theta is not generally equal to 0. In this case, the bound is given by : \operatorname(T) \geq \frac where \psi'(\theta) is the derivative of \psi(\theta) (by \theta), and I(\theta) is the Fisher information defined above.


Bound on the variance of biased estimators

Apart from being a bound on estimators of functions of the parameter, this approach can be used to derive a bound on the variance of biased estimators with a given bias, as follows. Consider an estimator \hat with bias b(\theta) = E\ - \theta, and let \psi(\theta) = b(\theta) + \theta. By the result above, any unbiased estimator whose expectation is \psi(\theta) has variance greater than or equal to (\psi'(\theta))^2/I(\theta). Thus, any estimator \hat whose bias is given by a function b(\theta) satisfies : \operatorname \left(\hat\right) \geq \frac. The unbiased version of the bound is a special case of this result, with b(\theta)=0. It's trivial to have a small variance − an "estimator" that is constant has a variance of zero. But from the above equation, we find that the
mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwee ...
of a biased estimator is bounded by :\operatorname\left((\hat-\theta)^2\right)\geq\frac+b(\theta)^2, using the standard decomposition of the MSE. Note, however, that if 1+b'(\theta)<1 this bound might be less than the unbiased Cramér–Rao bound 1/I(\theta). For instance, in the example of estimating variance below, 1+b'(\theta)= \frac <1.


Multivariate case

Extending the Cramér–Rao bound to multiple parameters, define a parameter column
vector Vector most often refers to: * Euclidean vector, a quantity with a magnitude and a direction * Disease vector, an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematics a ...
:\boldsymbol = \left \theta_1, \theta_2, \dots, \theta_d \rightT \in \mathbb^d with probability density function f(x; \boldsymbol) which satisfies the two regularity conditions below. The Fisher information matrix is a d \times d matrix with element I_ defined as : I_ = \operatorname \left \frac \log f\left(x; \boldsymbol\right) \frac \log f\left(x; \boldsymbol\right) \right= -\operatorname \left \frac \log f\left(x; \boldsymbol\right) \right Let \boldsymbol(X) be an estimator of any vector function of parameters, \boldsymbol(X) = (T_1(X), \ldots, T_d(X))^T, and denote its expectation vector \operatorname boldsymbol(X)/math> by \boldsymbol(\boldsymbol). The Cramér–Rao bound then states that the covariance matrix of \boldsymbol(X) satisfies : I\left(\boldsymbol\right) \geq \phi(\theta)^T \operatorname_\left(\boldsymbol(X)\right)^\phi(\theta) , : \operatorname_\left(\boldsymbol(X)\right) \geq \phi(\theta) I\left(\boldsymbol\right)^ \phi(\theta)^T where * The matrix inequality A \ge B is understood to mean that the matrix A-B is positive semidefinite, and * \phi(\theta) := \partial \boldsymbol(\boldsymbol)/\partial \boldsymbol is the
Jacobian matrix In vector calculus, the Jacobian matrix (, ) of a vector-valued function of several variables is the matrix of all its first-order partial derivatives. If this matrix is square, that is, if the number of variables equals the number of component ...
whose ij element is given by \partial \psi_i(\boldsymbol)/\partial \theta_j. If \boldsymbol(X) is an
unbiased Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is inaccurate, closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individ ...
estimator of \boldsymbol (i.e., \boldsymbol\left(\boldsymbol\right) = \boldsymbol), then the Cramér–Rao bound reduces to : \operatorname_\left(\boldsymbol(X)\right) \geq I\left(\boldsymbol\right)^. If it is inconvenient to compute the inverse of the Fisher information matrix, then one can simply take the reciprocal of the corresponding diagonal element to find a (possibly loose) lower bound. : \operatorname_(T_m(X)) = \left operatorname_\left(\boldsymbol(X)\right)\right \geq \left \left(\boldsymbol\right)^\right \geq \frac.


Regularity conditions

The bound relies on two weak regularity conditions on the
probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
, f(x; \theta), and the estimator T(X): * The Fisher information is always defined; equivalently, for all x such that f(x; \theta) > 0, \frac \log f(x;\theta) exists, and is finite. * The operations of integration with respect to x and differentiation with respect to \theta can be interchanged in the expectation of T; that is, \frac \left \int T(x) f(x;\theta) \,dx \right = \int T(x) \left \frac f(x;\theta) \right \,dx whenever the right-hand side is finite. This condition can often be confirmed by using the fact that integration and differentiation can be swapped when either of the following cases hold: *# The function f(x;\theta) has bounded support in x, and the bounds do not depend on \theta; *# The function f(x;\theta) has infinite support, is continuously differentiable, and the integral converges uniformly for all \theta.


Proof


Proof for the general case based on the Chapman–Robbins bound

Proof based on.


A standalone proof for the general scalar case

For the general scalar case: Assume that T=t(X) is an estimator with expectation \psi(\theta) (based on the observations X), i.e. that \operatorname(T) = \psi (\theta). The goal is to prove that, for all \theta, :\operatorname(t(X)) \geq \frac. Let X be a
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
with probability density function f(x; \theta). Here T = t(X) is a
statistic A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypot ...
, which is used as an estimator for \psi (\theta). Define V as the score: :V = \frac \ln f(X;\theta) = \frac\fracf(X;\theta) where the
chain rule In calculus, the chain rule is a formula that expresses the derivative of the Function composition, composition of two differentiable functions and in terms of the derivatives of and . More precisely, if h=f\circ g is the function such that h ...
is used in the final equality above. Then the expectation of V, written \operatorname(V), is zero. This is because: : \operatorname(V) = \int f(x;\theta)\left frac\frac f(x;\theta)\right\, dx = \frac\int f(x;\theta) \, dx = 0 where the integral and partial derivative have been interchanged (justified by the second regularity condition). If we consider the covariance \operatorname(V, T) of V and T, we have \operatorname(V, T) = \operatorname(V T), because \operatorname(V) = 0. Expanding this expression we have : \begin \operatorname(V,T) & = \operatorname \left( T \cdot\left frac\fracf(X;\theta) \right\right) \\ pt& = \int t(x) \left frac \frac f(x;\theta) \rightf(x;\theta)\, dx \\ pt& = \frac \left \int t(x) f(x;\theta)\,dx \right= \frac E(T) = \psi^\prime(\theta) \end again because the integration and differentiation operations commute (second condition). The
Cauchy–Schwarz inequality The Cauchy–Schwarz inequality (also called Cauchy–Bunyakovsky–Schwarz inequality) is an upper bound on the absolute value of the inner product between two vectors in an inner product space in terms of the product of the vector norms. It is ...
shows that : \sqrt \geq \left, \operatorname(V,T) \ = \left , \psi^\prime (\theta) \right , therefore : \operatorname (T) \geq \frac = \frac which proves the proposition.


Examples


Multivariate normal distribution

For the case of a ''d''-variate normal distribution : \boldsymbol \sim \mathcal_d \left( \boldsymbol( \boldsymbol) , ( \boldsymbol) \right) the Fisher information matrix has elements : I_ = \frac ^ \frac + \frac \operatorname \left( ^ \frac ^ \frac \right) where "tr" is the trace. For example, let w /math> be a sample of n independent observations with unknown mean \theta and known variance \sigma^2 . :w \sim \mathcal_ \left(\theta , \sigma^2 \right). Then the Fisher information is a scalar given by : I(\theta) = \left(\frac\right)^T^ \left(\frac\right) = \sum^_ \frac = \frac, and so the Cramér–Rao bound is : \operatorname(\hat\theta) \geq \frac.


Normal variance with known mean

Suppose ''X'' is a normally distributed random variable with known mean \mu and unknown variance \sigma^2. Consider the following statistic: : T=\frac. Then ''T'' is unbiased for \sigma^2, as E(T)=\sigma^2. What is the variance of ''T''? : \operatorname(T) = \operatorname\left(\frac\right) = \frac = \frac=\frac \left \operatorname\left\-\left(\operatorname\\right)^2 \right (the second equality follows directly from the definition of variance). The first term is the fourth moment about the mean and has value 3(\sigma^2)^2; the second is the square of the variance, or (\sigma^2)^2. Thus :\operatorname(T)=\frac. Now, what is the Fisher information in the sample? Recall that the score V is defined as : V=\frac\log\left L(\sigma^2,X)\right where L is the
likelihood function A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...
. Thus in this case, : \log\left (\sigma^2,X)\right\log\left frace^\right=-\log(\sqrt)-\frac : V=\frac\log \left L(\sigma^2,X) \right\frac\left \log(\sqrt)-\frac\right=-\frac+\frac where the second equality is from elementary calculus. Thus, the information in a single observation is just minus the expectation of the derivative of V, or : I =-\operatorname\left(\frac\right) =-\operatorname\left(-\frac+\frac\right) =\frac-\frac =\frac. Thus the information in a sample of n independent observations is just n times this, or \frac. The Cramér–Rao bound states that : \operatorname(T)\geq\frac. In this case, the inequality is saturated (equality is achieved), showing that the estimator is efficient. However, we can achieve a lower
mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwee ...
using a biased estimator. The estimator : T=\frac. obviously has a smaller variance, which is in fact :\operatorname(T)=\frac. Its bias is :\left(1-\frac\right)\sigma^2=\frac so its mean squared error is :\operatorname(T)=\left(\frac+\frac\right)(\sigma^2)^2 =\frac which is less than what unbiased estimators can achieve according to the Cramér–Rao bound. When the mean is not known, the minimum mean squared error estimate of the variance of a sample from Gaussian distribution is achieved by dividing by n+1, rather than n-1 or n+2.


See also

* Chapman–Robbins bound * Kullback's inequality * Brascamp–Lieb inequality * Lehmann–Scheffé theorem * Ziv–Zakai bound


References and notes


Further reading

* * * . Chapter 3. * . Section 3.1.3. * Posterior uncertainty, asymptotic law and Cramér-Rao bound, Structural Control and Health Monitoring 25(1851):e2113 DOI: 10.1002/stc.2113


External links


FandPLimitTool
a GUI-based software to calculate the Fisher information and Cramér-Rao lower bound with application to single-molecule microscopy. {{DEFAULTSORT:Cramer-Rao bound Articles containing proofs Statistical inequalities Estimation theory