HOME

TheInfoList



OR:

In
estimation theory Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their val ...
and statistics, the Cramér–Rao bound (CRB) expresses a lower bound on the
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
of
unbiased estimator In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In st ...
s of a deterministic (fixed, though unknown) parameter, the variance of any such estimator is at least as high as the inverse of the
Fisher information In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...
. Equivalently, it expresses an upper bound on the precision (the inverse of variance) of unbiased estimators: the precision of any such estimator is at most the Fisher information. The result is named in honor of
Harald Cramér Harald Cramér (; 25 September 1893 – 5 October 1985) was a Swedish mathematician, actuary, and statistician, specializing in mathematical statistics and probabilistic number theory. John Kingman described him as "one of the giants of stati ...
and C. R. Rao, but has independently also been derived by
Maurice Fréchet Maurice may refer to: People *Saint Maurice (died 287), Roman legionary and Christian martyr * Maurice (emperor) or Flavius Mauricius Tiberius Augustus (539–602), Byzantine emperor * Maurice (bishop of London) (died 1107), Lord Chancellor and ...
, Georges Darmois, as well as
Alexander Aitken Alexander Craig "Alec" Aitken (1 April 1895 – 3 November 1967) was one of New Zealand's most eminent mathematicians. In a 1935 paper he introduced the concept of generalized least squares, along with now standard vector/matrix notation f ...
and
Harold Silverstone Harold Silverstone (1915 – 1974) was a New Zealand mathematician and statistician. Early life and education He was born on 20 January 1915 in Dunedin, Otago, New Zealand. His father Mark Woolf Silverstone was a Jewish immigrant from Poland. ...
. An unbiased estimator that achieves this lower bound is said to be (fully) '' efficient''. Such a solution achieves the lowest possible
mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwe ...
among all unbiased methods, and is therefore the
minimum variance unbiased In statistics a minimum-variance unbiased estimator (MVUE) or uniformly minimum-variance unbiased estimator (UMVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter. For pra ...
(MVU) estimator. However, in some cases, no unbiased technique exists which achieves the bound. This may occur either if for any unbiased estimator, there exists another with a strictly smaller variance, or if an MVU estimator exists, but its variance is strictly greater than the inverse of the Fisher information. The Cramér–Rao bound can also be used to bound the variance of estimators of given bias. In some cases, a biased approach can result in both a variance and a
mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwe ...
that are the unbiased Cramér–Rao lower bound; see
estimator bias In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In s ...
.


Statement

The Cramér–Rao bound is stated in this section for several increasingly general cases, beginning with the case in which the parameter is a scalar and its estimator is unbiased. All versions of the bound require certain regularity conditions, which hold for most well-behaved distributions. These conditions are listed later in this section.


Scalar unbiased case

Suppose \theta is an unknown deterministic parameter that is to be estimated from n independent observations (measurements) of x, each from a distribution according to some
probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) c ...
f(x;\theta). The
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
of any ''unbiased'' estimator \hat of \theta is then bounded by the reciprocal of the
Fisher information In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...
I(\theta): :\operatorname(\hat) \geq \frac where the Fisher information I(\theta) is defined by : I(\theta) = n \operatorname_\theta \left \left( \frac \right)^2 \right and \ell(x;\theta)=\log (f(x;\theta)) is the
natural logarithm The natural logarithm of a number is its logarithm to the base of the mathematical constant , which is an irrational and transcendental number approximately equal to . The natural logarithm of is generally written as , , or sometimes, if ...
of the
likelihood function The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood funct ...
for a single sample x and \operatorname_\theta denotes the
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
with respect to the density f(x;\theta) of X. If \ell(x;\theta) is twice differentiable and certain regularity conditions hold, then the Fisher information can also be defined as follows: : I(\theta) = -n \operatorname_\theta\left \frac \right The efficiency of an unbiased estimator \hat measures how close this estimator's variance comes to this lower bound; estimator efficiency is defined as :e(\hat) = \frac or the minimum possible variance for an unbiased estimator divided by its actual variance. The Cramér–Rao lower bound thus gives :e(\hat) \le 1.


General scalar case

A more general form of the bound can be obtained by considering a biased estimator T(X), whose expectation is not \theta but a function of this parameter, say, \psi(\theta). Hence E\ - \theta = \psi(\theta) - \theta is not generally equal to 0. In this case, the bound is given by : \operatorname(T) \geq \frac where \psi'(\theta) is the derivative of \psi(\theta) (by \theta), and I(\theta) is the Fisher information defined above.


Bound on the variance of biased estimators

Apart from being a bound on estimators of functions of the parameter, this approach can be used to derive a bound on the variance of biased estimators with a given bias, as follows. Consider an estimator \hat with bias b(\theta) = E\ - \theta, and let \psi(\theta) = b(\theta) + \theta. By the result above, any unbiased estimator whose expectation is \psi(\theta) has variance greater than or equal to (\psi'(\theta))^2/I(\theta). Thus, any estimator \hat whose bias is given by a function b(\theta) satisfies : \operatorname \left(\hat\right) \geq \frac. The unbiased version of the bound is a special case of this result, with b(\theta)=0. It's trivial to have a small variance − an "estimator" that is constant has a variance of zero. But from the above equation we find that the
mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwe ...
of a biased estimator is bounded by :\operatorname\left((\hat-\theta)^2\right)\geq\frac+b(\theta)^2, using the standard decomposition of the MSE. Note, however, that if 1+b'(\theta)<1 this bound might be less than the unbiased Cramér–Rao bound 1/I(\theta). For instance, in the example of estimating variance below, 1+b'(\theta)= \frac <1.


Multivariate case

Extending the Cramér–Rao bound to multiple parameters, define a parameter column
vector Vector most often refers to: *Euclidean vector, a quantity with a magnitude and a direction *Vector (epidemiology), an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematic ...
:\boldsymbol = \left \theta_1, \theta_2, \dots, \theta_d \rightT \in \mathbb^d with probability density function f(x; \boldsymbol) which satisfies the two regularity conditions below. The
Fisher information matrix In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...
is a d \times d matrix with element I_ defined as : I_ = \operatorname \left \frac \log f\left(x; \boldsymbol\right) \frac \log f\left(x; \boldsymbol\right) \right= -\operatorname \left \frac \log f\left(x; \boldsymbol\right) \right Let \boldsymbol(X) be an estimator of any vector function of parameters, \boldsymbol(X) = (T_1(X), \ldots, T_d(X))^T, and denote its expectation vector \operatorname boldsymbol(X)/math> by \boldsymbol(\boldsymbol). The Cramér–Rao bound then states that the
covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements o ...
of \boldsymbol(X) satisfies : I\left(\boldsymbol\right) \geq \phi(\theta)^T \operatorname_\left(\boldsymbol(X)\right)^\phi(\theta) , : \operatorname_\left(\boldsymbol(X)\right) \geq \phi(\theta) I\left(\boldsymbol\right)^ \phi(\theta)^T where * The matrix inequality A \ge B is understood to mean that the matrix A-B is positive semidefinite, and * \phi(\theta) := \partial \boldsymbol(\boldsymbol)/\partial \boldsymbol is the
Jacobian matrix In vector calculus, the Jacobian matrix (, ) of a vector-valued function of several variables is the matrix of all its first-order partial derivatives. When this matrix is square, that is, when the function takes the same number of variables ...
whose ij element is given by \partial \psi_i(\boldsymbol)/\partial \theta_j. If \boldsymbol(X) is an unbiased estimator of \boldsymbol (i.e., \boldsymbol\left(\boldsymbol\right) = \boldsymbol), then the Cramér–Rao bound reduces to : \operatorname_\left(\boldsymbol(X)\right) \geq I\left(\boldsymbol\right)^. If it is inconvenient to compute the inverse of the
Fisher information matrix In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...
, then one can simply take the reciprocal of the corresponding diagonal element to find a (possibly loose) lower bound. : \operatorname_(T_m(X)) = \left operatorname_\left(\boldsymbol(X)\right)\right \geq \left \left(\boldsymbol\right)^\right \geq \left(\left \left(\boldsymbol\right)\right\right)^.


Regularity conditions

The bound relies on two weak regularity conditions on the
probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) c ...
, f(x; \theta), and the estimator T(X): * The Fisher information is always defined; equivalently, for all x such that f(x; \theta) > 0, :: \frac \log f(x;\theta) :exists, and is finite. * The operations of integration with respect to x and differentiation with respect to \theta can be interchanged in the expectation of T; that is, :: \frac \left \int T(x) f(x;\theta) \,dx \right = \int T(x) \left \frac f(x;\theta) \right \,dx :whenever the right-hand side is finite. :This condition can often be confirmed by using the fact that integration and differentiation can be swapped when either of the following cases hold: :# The function f(x;\theta) has bounded support in x, and the bounds do not depend on \theta; :# The function f(x;\theta) has infinite support, is
continuously differentiable In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non- vertical tangent line at each interior point in ...
, and the integral converges uniformly for all \theta.


Proof


Proof for the general case based on the Chapman–Robbins bound

Proof based on.


A standalone proof for the general scalar case

Assume that T=t(X) is an estimator with expectation \psi(\theta) (based on the observations X), i.e. that \operatorname(T) = \psi (\theta). The goal is to prove that, for all \theta, :\operatorname(t(X)) \geq \frac. Let X be a
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...
with probability density function f(x; \theta). Here T = t(X) is a
statistic A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hy ...
, which is used as an
estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
for \psi (\theta). Define V as the
score Score or scorer may refer to: *Test score, the result of an exam or test Business * Score Digital, now part of Bauer Radio * Score Entertainment, a former American trading card design and manufacturing company * Score Media, a former Canadian m ...
: :V = \frac \ln f(X;\theta) = \frac\fracf(X;\theta) where the
chain rule In calculus, the chain rule is a formula that expresses the derivative of the Function composition, composition of two differentiable functions and in terms of the derivatives of and . More precisely, if h=f\circ g is the function such that h(x) ...
is used in the final equality above. Then the
expectation Expectation or Expectations may refer to: Science * Expectation (epistemic) * Expected value, in mathematical probability theory * Expectation value (quantum mechanics) * Expectation–maximization algorithm, in statistics Music * ''Expectation' ...
of V, written \operatorname(V), is zero. This is because: : \operatorname(V) = \int f(x;\theta)\left frac\frac f(x;\theta)\right\, dx = \frac\int f(x;\theta) \, dx = 0 where the integral and partial derivative have been interchanged (justified by the second regularity condition). If we consider the
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the le ...
\operatorname(V, T) of V and T, we have \operatorname(V, T) = \operatorname(V T), because \operatorname(V) = 0. Expanding this expression we have : \begin \operatorname(V,T) & = \operatorname \left( T \cdot\left frac\fracf(X;\theta) \right\right) \\ pt& = \int t(x) \left frac \frac f(x;\theta) \rightf(x;\theta)\, dx \\ pt& = \frac \left \int t(x) f(x;\theta)\,dx \right= \frac E(T) = \psi^\prime(\theta) \end again because the integration and differentiation operations commute (second condition). The
Cauchy–Schwarz inequality The Cauchy–Schwarz inequality (also called Cauchy–Bunyakovsky–Schwarz inequality) is considered one of the most important and widely used inequalities in mathematics. The inequality for sums was published by . The corresponding inequality f ...
shows that : \sqrt \geq \left, \operatorname(V,T) \ = \left , \psi^\prime (\theta) \right , therefore : \operatorname (T) \geq \frac = \frac which proves the proposition.


Examples


Multivariate normal distribution

For the case of a ''d''-variate normal distribution : \boldsymbol \sim N_d \left( \boldsymbol( \boldsymbol) , ( \boldsymbol) \right) the
Fisher information matrix In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...
has elements : I_ = \frac ^ \frac + \frac \operatorname \left( ^ \frac ^ \frac \right) where "tr" is the trace. For example, let w /math> be a sample of N independent observations with unknown mean \theta and known variance \sigma^2 . :w \sim \mathbb_N \left(\theta , \sigma^2 \right). Then the Fisher information is a scalar given by : I(\theta) = \left(\frac\right)^T^ \left(\frac\right) = \sum^N_\frac = \frac, and so the Cramér–Rao bound is : \operatorname\left(\hat \theta\right) \geq \frac.


Normal variance with known mean

Suppose ''X'' is a
normally distributed In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu is ...
random variable with known mean \mu and unknown variance \sigma^2. Consider the following statistic: : T=\frac. Then ''T'' is unbiased for \sigma^2, as E(T)=\sigma^2. What is the variance of ''T''? : \operatorname(T) = \operatorname\left(\frac\right)=\frac=\frac \left \operatorname\left\-\left(\operatorname\\right)^2 \right (the second equality follows directly from the definition of variance). The first term is the fourth
moment about the mean In probability theory and statistics, a central moment is a moment of a probability distribution of a random variable about the random variable's mean; that is, it is the expected value of a specified integer power of the deviation of the random ...
and has value 3(\sigma^2)^2; the second is the square of the variance, or (\sigma^2)^2. Thus :\operatorname(T)=\frac. Now, what is the
Fisher information In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...
in the sample? Recall that the
score Score or scorer may refer to: *Test score, the result of an exam or test Business * Score Digital, now part of Bauer Radio * Score Entertainment, a former American trading card design and manufacturing company * Score Media, a former Canadian m ...
V is defined as : V=\frac\log\left L(\sigma^2,X)\right where L is the
likelihood function The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood funct ...
. Thus in this case, : \log\left (\sigma^2,X)\right\log\left frace^\right=-\log(\sqrt)-\frac : V=\frac\log \left L(\sigma^2,X) \right\frac\left \log(\sqrt)-\frac\right=-\frac+\frac where the second equality is from elementary calculus. Thus, the information in a single observation is just minus the expectation of the derivative of V, or : I =-\operatorname\left(\frac\right) =-\operatorname\left(-\frac+\frac\right) =\frac-\frac =\frac. Thus the information in a sample of n independent observations is just n times this, or \frac. The Cramér–Rao bound states that : \operatorname(T)\geq\frac. In this case, the inequality is saturated (equality is achieved), showing that the
estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
is efficient. However, we can achieve a lower
mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwe ...
using a biased estimator. The estimator : T=\frac. obviously has a smaller variance, which is in fact :\operatorname(T)=\frac. Its bias is :\left(1-\frac\right)\sigma^2=\frac so its mean squared error is :\operatorname(T)=\left(\frac+\frac\right)(\sigma^2)^2 =\frac which is clearly less than what unbiased estimators can achieve according to the Cramér–Rao bound. When the mean is not known, the minimum mean squared error estimate of the variance of a sample from Gaussian distribution is achieved by dividing by n+1, rather than n-1 or n+2.


See also

* Chapman–Robbins bound *
Kullback's inequality In information theory and statistics, Kullback's inequality is a lower bound on the Kullback–Leibler divergence expressed in terms of the large deviations rate function. If ''P'' and ''Q'' are probability distributions on the real line, such t ...
* Brascamp–Lieb inequality


References and notes


Further reading

* * * . Chapter 3. * . Section 3.1.3.


External links


FandPLimitTool
a GUI-based software to calculate the Fisher information and Cramér-Rao lower bound with application to single-molecule microscopy. {{DEFAULTSORT:Cramer-Rao bound Articles containing proofs Statistical inequalities Estimation theory