An ancillary statistic is a measure of a sample whose distribution (or whose pmf or pdf) does not depend on the

parameters A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...

of the model. An ancillary statistic is a

pivotal quantity In statistics, a pivotal quantity or pivot is a function of observations and unobservable parameters such that the function's probability distribution does not depend on the unknown parameters (including nuisance parameters). A pivot quantity need ...

that is also a statistic. Ancillary statistics can be used to construct

prediction interval In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are ...

s. This concept was introduced by

Ronald Fisher Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who ...

in the 1920s.

Examples

Suppose ''X''₁, ..., ''X''_''n'' are

independent and identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usual ...

, and are normally distributed with unknown expected value ''μ'' and known

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...

1. Let :

\overline_n = \frac

be the

sample mean The sample mean (or "empirical mean") and the sample covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or mean value) of a sample of numbers taken from a larger popu ...

. The following statistical measures of dispersion of the sample *

Range Range may refer to: Geography * Range (geographic), a chain of hills or mountains; a somewhat linear, complex mountainous or hilly area (cordillera, sierra) ** Mountain range, a group of mountains bordered by lowlands * Range, a term used to i ...

: max(''X''₁, ..., ''X''_''n'') − min(''X''₁, ..., ''X_n'') *

Interquartile range In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the difference ...

: ''Q''₃ − ''Q''₁ *

Sample variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...

: ::

\hat^2:=\,\frac

are all ''ancillary statistics'', because their sampling distributions do not change as ''μ'' changes. Computationally, this is because in the formulas, the ''μ'' terms cancel – adding a constant number to a distribution (and all samples) changes its sample maximum and minimum by the same amount, so it does not change their difference, and likewise for others: these measures of dispersion do not depend on location. Conversely, given i.i.d. normal variables with known mean 1 and unknown variance ''σ''², the sample mean

\overline

is ''not'' an ancillary statistic of the variance, as the sampling distribution of the sample mean is ''N''(1, ''σ''²/''n''), which does depend on ''σ'' ² – this measure of location (specifically, its

standard error The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error o ...

) depends on dispersion.

In location-scale families

In a location family of distributions,

(X_1 - X_n, X_2 - X_n, \dots, X_ - X_n)

is an ancillary statistic. In a scale family of distributions,

(\frac, \frac, \dots, \frac)

is an ancillary statistic. In a location-scale family of distributions,

(\frac, \frac, \dots, \frac)

, where

S^2

is the sample variance, is an ancillary statistic.

In recovery of information

It turns out that, if

T_1

is a non-sufficient statistic and

T_2

is ancillary, one can sometimes recover all the information about the unknown parameter contained in the entire data by reporting

T_1

while conditioning on the observed value of

T_2

. This is known as ''conditional inference''. For example, suppose that

X_1, X_2

follow the

N(\theta, 1)

distribution where

\theta

is unknown. Note that, even though

X_1

is not sufficient for

\theta

(since its Fisher information is 1, whereas the Fisher information of the complete statistic

\overline

is 2), by additionally reporting the ancillary statistic

X_1 - X_2

, one obtains a join distribution with Fisher information 2.

Ancillary complement

Given a statistic ''T'' that is not

sufficient In logic and mathematics, necessity and sufficiency are terms used to describe a conditional or implicational relationship between two statements. For example, in the conditional statement: "If then ", is necessary for , because the truth of ...

, an ancillary complement is a statistic ''U'' that is ancillary and such that (''T'', ''U'') is sufficient.Ancillary Statistics: A Review
by M. Ghosh, N. Reid and D.A.S. Fraser Intuitively, an ancillary complement "adds the missing information" (without duplicating any). The statistic is particularly useful if one takes ''T'' to be a

maximum likelihood estimator In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statis ...

, which in general will not be sufficient; then one can ask for an ancillary complement. In this case, Fisher argues that one must condition on an ancillary complement to determine information content: one should consider the

Fisher information In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...

content of ''T'' to not be the marginal of ''T'', but the conditional distribution of ''T'', given ''U'': how much information does ''T'' ''add''? This is not possible in general, as no ancillary complement need exist, and if one exists, it need not be unique, nor does a maximum ancillary complement exist.

Example

baseball Baseball is a bat-and-ball sport played between two teams of nine players each, taking turns batting and fielding. The game occurs over the course of several plays, with each play generally beginning when a player on the fielding t ...

, suppose a scout observes a batter in ''N'' at-bats. Suppose (unrealistically) that the number ''N'' is chosen by some random process that is

independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...

of the batter's ability – say a coin is tossed after each at-bat and the result determines whether the scout will stay to watch the batter's next at-bat. The eventual data are the number ''N'' of at-bats and the number ''X'' of hits: the data (''X'', ''N'') are a sufficient statistic. The observed

batting average Batting average is a statistic in cricket, baseball, and softball that measures the performance of batters. The development of the baseball statistic was influenced by the cricket statistic. Cricket In cricket, a player's batting average is ...

''X''/''N'' fails to convey all of the information available in the data because it fails to report the number ''N'' of at-bats (e.g., a batting average of 0.400, which is very high, based on only five at-bats does not inspire anywhere near as much confidence in the player's ability than a 0.400 average based on 100 at-bats). The number ''N'' of at-bats is an ancillary statistic because * It is a part of the observable data (it is a ''statistic''), and * Its probability distribution does not depend on the batter's ability, since it was chosen by a random process independent of the batter's ability. This ancillary statistic is an ancillary complement to the observed batting average ''X''/''N'', i.e., the batting average ''X''/''N'' is not a sufficient statistic, in that it conveys less than all of the relevant information in the data, but conjoined with ''N'', it becomes sufficient.

Notes

{{DEFAULTSORT:Ancillary Statistic Statistical theory