Observed Information Matrix
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, the observed information, or observed Fisher information, is the negative of the second derivative (the
Hessian matrix In mathematics, the Hessian matrix, Hessian or (less commonly) Hesse matrix is a square matrix of second-order partial derivatives of a scalar-valued Function (mathematics), function, or scalar field. It describes the local curvature of a functio ...
) of the "
log-likelihood A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the j ...
" (the logarithm of the
likelihood function A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...
). It is a sample-based version of the
Fisher information In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that models ''X''. Formally, it is the variance ...
.


Definition

Suppose we observe
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
s X_1,\ldots,X_n, independent and identically distributed with density ''f''(''X''; θ), where θ is a (possibly unknown) vector. Then the log-likelihood of the parameters \theta given the data X_1,\ldots,X_n is :\ell(\theta , X_1,\ldots,X_n) = \sum_^n \log f(X_i, \theta) . We define the observed information matrix at \theta^ as :\mathcal(\theta^*) = - \left. \nabla \nabla^ \ell(\theta) \_ ::= - \left. \left( \begin \tfrac & \tfrac & \cdots & \tfrac \\ \tfrac & \tfrac & \cdots & \tfrac \\ \vdots & \vdots & \ddots & \vdots \\ \tfrac & \tfrac & \cdots & \tfrac \\ \end \right) \ell(\theta) \_ Since the inverse of the information matrix is the
asymptotic In analytic geometry, an asymptote () of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the ''x'' or ''y'' coordinates Limit of a function#Limits at infinity, tends to infinity. In pro ...
covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...
of the corresponding maximum-likelihood estimator, the observed information is often evaluated at the maximum-likelihood estimate for the purpose of significance testing or confidence-interval construction. The invariance property of maximum-likelihood estimators allows the observed information matrix to be evaluated before being inverted.


Alternative definition

Andrew Gelman Andrew Eric Gelman (born February 11, 1965) is an American statistician who is Higgins Professor of Statistics and a professor of political science at Columbia University. Gelman attended the Massachusetts Institute of Technology as a National M ...
, David Dunson and
Donald Rubin Donald Bruce Rubin (born December 22, 1943) is an Emeritus Professor of Statistics at Harvard University, where he chaired the department of Statistics for 13 years. He also works at Tsinghua University in China and at Temple University in Philad ...
define observed information instead in terms of the parameters'
posterior probability The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posteri ...
, p(\theta, y): I(\theta) = - \frac \log p(\theta, y)


Fisher information

The
Fisher information In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that models ''X''. Formally, it is the variance ...
\mathcal(\theta) is the
expected value In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...
of the observed information given a single observation X distributed according to the hypothetical model with parameter \theta: :\mathcal(\theta) = \mathrm(\mathcal(\theta)).


Comparison with the expected information

The comparison between the observed information and the expected information remains an active and ongoing area of research and debate. Efron and Hinkley provided a frequentist justification for preferring the observed information to the expected information when employing normal approximations to the distribution of the maximum-likelihood estimator in one-parameter families in the presence of an ancillary statistic that affects the precision of the MLE. Lindsay and Li showed that the observed information matrix gives the minimum
mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwee ...
as an approximation of the true information if an error term of O(n^) is ignored. In Lindsay and Li's case, the expected information matrix still requires evaluation at the obtained ML estimates, introducing randomness. However, when the construction of confidence intervals is of primary focus, there are reported findings that the expected information outperforms the observed counterpart. Yuan and Spall showed that the expected information outperforms the observed counterpart for confidence-interval constructions of scalar parameters in the
mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwee ...
sense. This finding was later generalized to multiparameter cases, although the claim had been weakened to the expected information matrix performing at least as well as the observed information matrix.


See also

*
Fisher information matrix In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that models ''X''. Formally, it is the variance ...
*
Fisher information metric In information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, ''i.e.'', a smooth manifold whose points are probability distributions. It can be used to calculate the ...


References

{{reflist Information theory Estimation theory