The ratio estimator is a
statistical parameter
In statistics, as opposed to its general use in mathematics, a parameter is any measured quantity of a statistical population that summarises or describes an aspect of the population, such as a mean or a standard deviation. If a population ex ...
and is defined to be the
ratio
In mathematics, a ratio shows how many times one number contains another. For example, if there are eight oranges and six lemons in a bowl of fruit, then the ratio of oranges to lemons is eight to six (that is, 8:6, which is equivalent to the ...
of
mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set.
For a data set, the '' ari ...
s of two random variables. Ratio estimates are
bias
Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group ...
ed and corrections must be made when they are used in experimental or survey work. The ratio estimates are asymmetrical and symmetrical tests such as the
t test should not be used to generate confidence intervals.
The bias is of the order ''O''(1/''n'') (see
big O notation) so as the sample size (''n'') increases, the bias will asymptotically approach 0. Therefore, the estimator is approximately unbiased for large sample sizes.
Definition
Assume there are two characteristics – ''x'' and ''y'' – that can be observed for each sampled element in the data set. The ratio ''R'' is
:
The ratio estimate of a value of the ''y'' variate (''θ''
''y'') is
:
where ''θ''
''x'' is the corresponding value of the ''x'' variate. ''θ''
''y'' is known to be asymptotically normally distributed.
[Scott AJ, Wu CFJ (1981) On the asymptotic distribution of ratio and regression
estimators. JASA 76: 98–102]
Statistical properties
The sample ratio (''r'') is estimated from the sample
:
That the ratio is biased can be shown with
Jensen's inequality
In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906, building on an earlier ...
as follows (assuming independence between x and y):
:
Under simple random sampling the bias is of the order ''O''( ''n''
−1 ). An upper bound on the relative bias of the estimate is provided by the
coefficient of variation
In probability theory and statistics, the coefficient of variation (CV), also known as relative standard deviation (RSD), is a standardized measure of dispersion of a probability distribution or frequency distribution. It is often expressed ...
(the ratio of the
standard deviation to the
mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set.
For a data set, the '' ari ...
).
[Cochran WG (1977) Sampling techniques. New York: John Wiley & Sons] Under simple random sampling the relative bias is ''O''( ''n''
−1/2 ).
Correction of the mean's bias
The correction methods, depending on the distributions of the ''x'' and ''y'' variates, differ in their efficiency making it difficult to recommend an overall best method. Because the estimates of ''r'' are biased a corrected version should be used in all subsequent calculations.
A correction of the bias accurate to the first order is
:
where ''m''
''x'' is the mean of the variate ''x'' and ''s''
''xy'' is the
covariance
In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the le ...
between ''x'' and ''y''.
To simplify the notation ''s''
xy will be used subsequently to denote the covariance between the variates ''x'' and ''y''.
Another estimator based on the
Taylor expansion
In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor se ...
is
:
where ''n'' is the sample size, ''N'' is the population size, ''m''
''x'' is the mean of the ''x'' variate and ''s''
''x''2 and ''s''
''y''2 are the sample
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
s of the ''x'' and ''y'' variates respectively.
A computationally simpler but slightly less accurate version of this estimator is
:
where ''N'' is the population size, ''n'' is the sample size, ''m''
''x'' is the mean of the ''x'' variate and ''s''
''x''2 and ''s''
''y''2 are the sample
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
s of the ''x'' and ''y'' variates respectively. These versions differ only in the factor in the denominator (''N'' - 1). For a large ''N'' the difference is negligible.
If ''x'' and ''y'' are unitless counts with
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known ...
a second-order correction is
[Ogliore RC, Huss GR, Nagashima K (2011) Ratio estimation in SIMS analysis. Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms 269 (17) 1910–1918]
:
Other methods of bias correction have also been proposed. To simplify the notation the following variables will be used
:
:
:
Pascual's estimator:
[Pascual JN (1961) Unbiased ratio estimators in stratified sampling. JASA 56(293):70–87]
:
Beale's estimator:
[Beale EML (1962) Some use of computers in operational research. Industrielle Organization 31: 27-28]
:
Tin's estimator:
[Tin M (1965) Comparison of some ratio estimators. JASA 60: 294–307]
:
Sahoo's estimator:
[Sahoo LN (1983). On a method of bias reduction in ratio estimation. J Statist Res 17:1—6]
:
Sahoo has also proposed a number of additional estimators:
[Sahoo LN (1987) On a class of almost unbiased estimators for population ratio. Statistics 18: 119-121]
:
:
:
If ''x'' and ''y'' are unitless counts with Poisson distribution and ''m''
''x'' and ''m''
''y'' are both greater than 10, then the following approximation is correct to order O( ''n''
−3 ).
: