The scaled inverse chi-squared distribution

\psi \, \mbox \chi^2(\nu)

, where

\psi

is the scale parameter, equals the univariate

inverse Wishart distribution In statistics, the inverse Wishart distribution, also called the inverted Wishart distribution, is a probability distribution defined on real-valued positive-definite matrix, positive-definite matrix (mathematics), matrices. In Bayesian statisti ...

\mathcal^(\psi,\nu)

with degrees of freedom

\nu

. This family of scaled inverse chi-squared distributions is linked to the

inverse-chi-squared distribution In probability and statistics, the inverse-chi-squared distribution (or inverted-chi-square distributionBernardo, J.M.; Smith, A.F.M. (1993) ''Bayesian Theory'', Wiley (pages 119, 431) ) is a continuous probability distribution of a positive-val ...

and to the

chi-squared distribution In probability theory and statistics, the \chi^2-distribution with k Degrees of freedom (statistics), degrees of freedom is the distribution of a sum of the squares of k Independence (probability theory), independent standard normal random vari ...

: If

X \sim \psi \, \mbox \chi^2(\nu)

then

X/\psi \sim \mbox \chi^2(\nu)

as well as

\psi/X \sim \chi^2(\nu)

and

1/X \sim \psi^\chi^2(\nu)

. Instead of

\psi

, the scaled inverse chi-squared distribution is however most frequently parametrized by the scale parameter

\tau^2 = \psi/\nu

and the distribution

\nu \tau^2 \, \mbox \chi^2(\nu)

is denoted by

\mbox\chi^2(\nu, \tau^2)

. In terms of

\tau^2

the above relations can be written as follows: If

X \sim \mbox\chi^2(\nu, \tau^2)

then

\frac \sim \mbox \chi^2(\nu)

as well as

\frac \sim \chi^2(\nu)

and

1/X \sim \frac\chi^2(\nu)

. This family of scaled inverse chi-squared distributions is a reparametrization of the

inverse-gamma distribution In probability theory and statistics, the inverse gamma distribution is a two-parameter family of continuous probability distributions on the positive real line, which is the distribution of the reciprocal of a variable distributed according to ...

. Specifically, if :

X \sim \psi \, \mbox \chi^2(\nu) = \mbox\chi^2(\nu, \tau^2)

then

X \sim \textrm\left(\frac, \frac\right) =  \textrm\left(\frac, \frac\right)

Either form may be used to represent the

maximum entropy The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...

distribution for a fixed first inverse moment

(E(1/X))

and first logarithmic moment

(E(\ln(X))

. The scaled inverse chi-squared distribution also has a particular use in

Bayesian statistics Bayesian statistics ( or ) is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about ...

. Specifically, the scaled inverse chi-squared distribution can be used as a

conjugate prior In Bayesian probability theory, if, given a likelihood function p(x \mid \theta), the posterior distribution p(\theta \mid x) is in the same probability distribution family as the prior probability distribution p(\theta), the prior and posteri ...

for the

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

parameter of a

normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...

. The same prior in alternative parametrization is given by the

Characterization

The

probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...

of the scaled inverse chi-squared distribution extends over the domain

x>0

and is :

f(x; \nu, \tau^2)=
\frac~
\frac

where

\nu

is the

degrees of freedom In many scientific fields, the degrees of freedom of a system is the number of parameters of the system that may vary independently. For example, a point in the plane has two degrees of freedom for translation: its two coordinates; a non-infinite ...

parameter and

\tau^2

is the

scale parameter In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions. The larger the scale parameter, the more spread out the distribution. Definition If a family ...

. The cumulative distribution function is :

F(x; \nu, \tau^2)=
\Gamma\left(\frac,\frac\right)
\left/\Gamma\left(\frac\right)\right.

=Q\left(\frac,\frac\right)

where

\Gamma(a,x)

is the

incomplete gamma function In mathematics, the upper and lower incomplete gamma functions are types of special functions which arise as solutions to various mathematical problems such as certain integrals. Their respective names stem from their integral definitions, whic ...

\Gamma(x)

is the

gamma function In mathematics, the gamma function (represented by Γ, capital Greek alphabet, Greek letter gamma) is the most common extension of the factorial function to complex numbers. Derived by Daniel Bernoulli, the gamma function \Gamma(z) is defined ...

and

Q(a,x)

is a regularized gamma function. The

characteristic function In mathematics, the term "characteristic function" can refer to any of several distinct concepts: * The indicator function of a subset, that is the function \mathbf_A\colon X \to \, which for a given subset ''A'' of ''X'', has value 1 at points ...

is :

\varphi(t;\nu,\tau^2)=

\frac\left(\frac\right)^\!\!K_\left(\sqrt\right) ,

where

K_(z)

is the modified

Modified Bessel function of the second kind Bessel functions, named after Friedrich Bessel who was the first to systematically study them in 1824, are canonical solutions of Bessel's differential equation x^2 \frac + x \frac + \left(x^2 - \alpha^2 \right)y = 0 for an arbitrary complex ...

Parameter estimation

The

maximum likelihood estimate In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...

\tau^2

is :

\tau^2 = n/\sum_^n \frac.

The maximum likelihood estimate of

\frac

can be found using

Newton's method In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...

on: :

\ln\left(\frac\right) - \psi\left(\frac\right) = \frac \sum_^n \ln\left(x_i\right) - \ln\left(\tau^2\right) ,

where

\psi(x)

is the

digamma function In mathematics, the digamma function is defined as the logarithmic derivative of the gamma function: :\psi(z) = \frac\ln\Gamma(z) = \frac. It is the first of the polygamma functions. This function is Monotonic function, strictly increasing a ...

. An initial estimate can be found by taking the formula for mean and solving it for

\nu.

Let

\bar = \frac\sum_^n x_i

be the sample mean. Then an initial estimate for

\nu

is given by: :

\frac = \frac.

Bayesian estimation of the variance of a normal distribution

The scaled inverse chi-squared distribution has a second important application, in the Bayesian estimation of the variance of a Normal distribution. According to

Bayes' theorem Bayes' theorem (alternatively Bayes' law or Bayes' rule, after Thomas Bayes) gives a mathematical rule for inverting Conditional probability, conditional probabilities, allowing one to find the probability of a cause given its effect. For exampl ...

, the

posterior probability distribution The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior ...

for quantities of interest is proportional to the product of a

prior distribution A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...

for the quantities and a

likelihood function A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...

: :

p(\sigma^2, D,I) \propto p(\sigma^2, I) \; p(D, \sigma^2)

where ''D'' represents the data and ''I'' represents any initial information about σ² that we may already have. The simplest scenario arises if the mean μ is already known; or, alternatively, if it is the

conditional distribution Conditional (if then) may refer to: * Causal conditional, if X then Y, where X is a cause of Y *Conditional probability, the probability of an event A given that another event B * Conditional proof, in logic: a proof that asserts a conditional, ...

of σ² that is sought, for a particular assumed value of μ. Then the likelihood term ''L''(σ², ''D'') = ''p''(''D'', σ²) has the familiar form :

\mathcal(\sigma^2, D,\mu) = \frac \; \exp \left -\frac \right /math>

Combining this with the rescaling-invariant prior p(σ

², ''I'') = 1/σ², which can be argued (e.g. following Jeffreys) to be the least informative possible prior for σ² in this problem, gives a combined posterior probability :

p(\sigma^2, D, I, \mu) \propto \frac \; \exp \left -\frac \right /math>
This form can be recognised as that of a scaled inverse chi-squared distribution, with parameters ν = ''n'' and τ

² = ''s''² = (1/''n'') Σ (x_i-μ)² Gelman and co-authors remark that the re-appearance of this distribution, previously seen in a sampling context, may seem remarkable; but given the choice of prior "this result is not surprising." In particular, the choice of a rescaling-invariant prior for σ² has the result that the probability for the ratio of σ² / ''s''² has the same form (independent of the conditioning variable) when conditioned on ''s''² as when conditioned on σ²: :

p(\tfrac, s^2) = p(\tfrac, \sigma^2)

In the sampling-theory case, conditioned on σ², the probability distribution for (1/s²) is a scaled inverse chi-squared distribution; and so the probability distribution for σ² conditioned on ''s''², given a scale-agnostic prior, is also a scaled inverse chi-squared distribution.

Use as an informative prior

If more is known about the possible values of σ², a distribution from the scaled inverse chi-squared family, such as Scale-inv-χ²(''n''₀, ''s''₀²) can be a convenient form to represent a more informative prior for σ², as if from the result of ''n''₀ previous observations (though ''n''₀ need not necessarily be a whole number): :

p(\sigma^2, I^\prime, \mu) \propto \frac \; \exp \left -\frac \right /math>
Such a prior would lead to the posterior distribution
: p(\sigma^2, D, I^\prime, \mu) \propto \frac \; \exp \left -\frac \right /math>
which is itself a scaled inverse chi-squared distribution.  The scaled inverse chi-squared distributions are thus a convenient

family for σ² estimation.

Estimation of variance when mean is unknown

If the mean is not known, the most uninformative prior that can be taken for it is arguably the translation-invariant prior ''p''(μ, ''I'') ∝ const., which gives the following joint posterior distribution for μ and σ², :

\end

The marginal posterior distribution for σ² is obtained from the joint posterior distribution by integrating out over μ, :

\end

This is again a scaled inverse chi-squared distribution, with parameters

\scriptstyle\;

and

\scriptstyle

Related distributions

* If

X \sim \mbox\chi^2(\nu, \tau^2)

then

k X \sim \mbox\chi^2(\nu, k \tau^2)\,

* If

X \sim \mbox\chi^2(\nu) \,

(

Inverse-chi-squared distribution In probability and statistics, the inverse-chi-squared distribution (or inverted-chi-square distributionBernardo, J.M.; Smith, A.F.M. (1993) ''Bayesian Theory'', Wiley (pages 119, 431) ) is a continuous probability distribution of a positive-val ...

) then

X \sim \mbox\chi^2(\nu, 1/\nu) \,

* If

X \sim \mbox\chi^2(\nu, \tau^2)

then

\frac \sim \mbox\chi^2(\nu) \,

(

) * If

X \sim \mbox\chi^2(\nu, \tau^2)

then

X \sim \textrm\left(\frac, \frac\right)

(

Inverse-gamma distribution In probability theory and statistics, the inverse gamma distribution is a two-parameter family of continuous probability distributions on the positive real line, which is the distribution of the reciprocal of a variable distributed according to ...

) * Scaled inverse chi square distribution is a special case of type 5

Pearson distribution The Pearson distribution is a family of continuous probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostatistics. History The Pearson syste ...

References

* {{DEFAULTSORT:Scaled-Inverse-Chi-Squared Distribution Continuous distributions Exponential family distributions