statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, the observed information, or observed Fisher information, is the negative of the second derivative (the

Hessian matrix In mathematics, the Hessian matrix, Hessian or (less commonly) Hesse matrix is a square matrix of second-order partial derivatives of a scalar-valued Function (mathematics), function, or scalar field. It describes the local curvature of a functio ...

) of the "

log-likelihood A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the j ...

" (the logarithm of the

likelihood function A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...

). It is a sample-based version of the

Fisher information In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that models ''X''. Formally, it is the variance ...

Definition

Suppose we observe

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

X_1,\ldots,X_n

, independent and identically distributed with density ''f''(''X''; θ), where θ is a (possibly unknown) vector. Then the log-likelihood of the parameters

\theta

given the data

X_1,\ldots,X_n

is :

\ell(\theta ,  X_1,\ldots,X_n) =  \sum_^n \log f(X_i,  \theta)

. We define the observed information matrix at

\theta^

as :

\mathcal(\theta^*) 
  = - \left. 
    \nabla \nabla^ 
    \ell(\theta)
  \_

= -
\left.
\left( \begin
  \tfrac
  &  \tfrac
  &  \cdots
  &  \tfrac \\
  \tfrac
  &  \tfrac
  &  \cdots
  &  \tfrac \\
  \vdots &
  \vdots &
  \ddots &
  \vdots \\
  \tfrac
  &  \tfrac
  &  \cdots
  &  \tfrac \\
\end \right) 
\ell(\theta)
\_

Since the inverse of the information matrix is the

asymptotic In analytic geometry, an asymptote () of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the ''x'' or ''y'' coordinates Limit of a function#Limits at infinity, tends to infinity. In pro ...

covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...

of the corresponding maximum-likelihood estimator, the observed information is often evaluated at the maximum-likelihood estimate for the purpose of significance testing or confidence-interval construction. The invariance property of maximum-likelihood estimators allows the observed information matrix to be evaluated before being inverted.

Alternative definition

Andrew Gelman Andrew Eric Gelman (born February 11, 1965) is an American statistician who is Higgins Professor of Statistics and a professor of political science at Columbia University. Gelman attended the Massachusetts Institute of Technology as a National M ...

, David Dunson and

Donald Rubin Donald Bruce Rubin (born December 22, 1943) is an Emeritus Professor of Statistics at Harvard University, where he chaired the department of Statistics for 13 years. He also works at Tsinghua University in China and at Temple University in Philad ...

define observed information instead in terms of the parameters'

posterior probability The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posteri ...

p(\theta, y)

I(\theta) = - \frac \log p(\theta, y)

Fisher information

The

\mathcal(\theta)

is the

expected value In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...

of the observed information given a single observation

X

distributed according to the hypothetical model with parameter

\theta

: :

\mathcal(\theta) = \mathrm(\mathcal(\theta))

Comparison with the expected information

The comparison between the observed information and the expected information remains an active and ongoing area of research and debate. Efron and Hinkley provided a frequentist justification for preferring the observed information to the expected information when employing normal approximations to the distribution of the maximum-likelihood estimator in one-parameter families in the presence of an ancillary statistic that affects the precision of the MLE. Lindsay and Li showed that the observed information matrix gives the minimum

mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwee ...

as an approximation of the true information if an error term of

O(n^)

is ignored. In Lindsay and Li's case, the expected information matrix still requires evaluation at the obtained ML estimates, introducing randomness. However, when the construction of confidence intervals is of primary focus, there are reported findings that the expected information outperforms the observed counterpart. Yuan and Spall showed that the expected information outperforms the observed counterpart for confidence-interval constructions of scalar parameters in the

sense. This finding was later generalized to multiparameter cases, although the claim had been weakened to the expected information matrix performing at least as well as the observed information matrix.

References

{{reflist Information theory Estimation theory

Definition

Alternative definition

Fisher information

Comparison with the expected information

See also

References