The ratio estimator is a
statistical estimator for the
ratio
In mathematics, a ratio () shows how many times one number contains another. For example, if there are eight oranges and six lemons in a bowl of fruit, then the ratio of oranges to lemons is eight to six (that is, 8:6, which is equivalent to the ...
of
mean
A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
s of two random variables. Ratio estimates are
bias
Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is inaccurate, closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individ ...
ed and corrections must be made when they are used in experimental or survey work. The ratio estimates are asymmetrical and symmetrical tests such as the
t test should not be used to generate confidence intervals.
The bias is of the order ''O''(1/''n'') (see
big O notation
Big ''O'' notation is a mathematical notation that describes the asymptotic analysis, limiting behavior of a function (mathematics), function when the Argument of a function, argument tends towards a particular value or infinity. Big O is a memb ...
) so as the sample size (''n'') increases, the bias will asymptotically approach 0. Therefore, the estimator is approximately unbiased for large sample sizes.
Definition
Assume there are two characteristics – ''x'' and ''y'' – that can be observed for each sampled element in the data set. The ratio ''R'' is
:
The ratio estimate of a value of the ''y'' variate (''θ''
''y'') is
:
where ''θ''
''x'' is the corresponding value of the ''x'' variate. ''θ''
''y'' is known to be asymptotically normally distributed.
[Scott AJ, Wu CFJ (1981) On the asymptotic distribution of ratio and regression
estimators. JASA 76: 98–102]
Statistical properties
The sample ratio (''r'') is estimated from the sample
:
That the ratio is biased can be shown with
Jensen's inequality as follows (assuming independence between
and
):
:
where
is the mean of the variate
and
is the mean of the variate
.
Under simple random sampling the bias is of the order ''O''( ''n''
−1 ). An upper bound on the relative bias of the estimate is provided by the
coefficient of variation (the ratio of the
standard deviation
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
to the
mean
A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
).
[Cochran WG (1977) Sampling techniques. New York: John Wiley & Sons] Under simple random sampling the relative bias is ''O''( ''n''
−1/2 ).
Correction of the mean's bias
The correction methods, depending on the distributions of the ''x'' and ''y'' variates, differ in their efficiency making it difficult to recommend an overall best method. Because the estimates of ''r'' are biased a corrected version should be used in all subsequent calculations.
A correction of the bias accurate to the first order is
:
where ''m''
''x'' is the mean of the variate ''x'' and ''s''
''xy'' is the
covariance between ''x'' and ''y''.
To simplify the notation ''s''
xy will be used subsequently to denote the covariance between the variates ''x'' and ''y''.
Another estimator based on the
Taylor expansion is
:
where ''n'' is the sample size, ''N'' is the population size, ''m''
''x'' is the mean of the ''x'' variate and ''s''
''x''2 and ''s''
''y''2 are the sample
variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
s of the ''x'' and ''y'' variates respectively.
A computationally simpler but slightly less accurate version of this estimator is
:
where ''N'' is the population size, ''n'' is the sample size, ''m''
''x'' is the mean of the ''x'' variate and ''s''
''x''2 and ''s''
''y''2 are the sample
variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
s of the ''x'' and ''y'' variates respectively. These versions differ only in the factor in the denominator (''N'' - 1). For a large ''N'' the difference is negligible.
If ''x'' and ''y'' are unitless counts with
Poisson distribution
In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...
a second-order correction is
[Ogliore RC, Huss GR, Nagashima K (2011) Ratio estimation in SIMS analysis. Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms 269 (17) 1910–1918]
:
Other methods of bias correction have also been proposed. To simplify the notation the following variables will be used
:
:
:
Pascual's estimator:
[Pascual JN (1961) Unbiased ratio estimators in stratified sampling. JASA 56(293):70–87]
:
Beale's estimator:
[Beale EML (1962) Some use of computers in operational research. Industrielle Organization 31: 27-28]
:
Tin's estimator:
[Tin M (1965) Comparison of some ratio estimators. JASA 60: 294–307]
:
Sahoo's estimator:
[Sahoo LN (1983). On a method of bias reduction in ratio estimation. J Statist Res 17:1—6]
:
Sahoo has also proposed a number of additional estimators:
[Sahoo LN (1987) On a class of almost unbiased estimators for population ratio. Statistics 18: 119-121]
:
:
:
If ''x'' and ''y'' are unitless counts with Poisson distribution and ''m''
''x'' and ''m''
''y'' are both greater than 10, then the following approximation is correct to order O( ''n''
−3 ).
: