The ratio estimator is a

statistical parameter In statistics, as opposed to its general use in mathematics, a parameter is any measured quantity of a statistical population that summarises or describes an aspect of the population, such as a mean or a standard deviation. If a population ex ...

and is defined to be the

ratio In mathematics, a ratio shows how many times one number contains another. For example, if there are eight oranges and six lemons in a bowl of fruit, then the ratio of oranges to lemons is eight to six (that is, 8:6, which is equivalent to the ...

mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ari ...

s of two random variables. Ratio estimates are

bias Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group ...

ed and corrections must be made when they are used in experimental or survey work. The ratio estimates are asymmetrical and symmetrical tests such as the t test should not be used to generate confidence intervals. The bias is of the order ''O''(1/''n'') (see big O notation) so as the sample size (''n'') increases, the bias will asymptotically approach 0. Therefore, the estimator is approximately unbiased for large sample sizes.

Definition

Assume there are two characteristics – ''x'' and ''y'' – that can be observed for each sampled element in the data set. The ratio ''R'' is :

R = \bar_y  / \bar_x

The ratio estimate of a value of the ''y'' variate (''θ''_''y'') is :

\theta_y = R \theta_x

where ''θ''_''x'' is the corresponding value of the ''x'' variate. ''θ''_''y'' is known to be asymptotically normally distributed.Scott AJ, Wu CFJ (1981) On the asymptotic distribution of ratio and regression estimators. JASA 76: 98–102

Statistical properties

The sample ratio (''r'') is estimated from the sample :

r = \frac = \frac

That the ratio is biased can be shown with

Jensen's inequality In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906, building on an earlier ...

as follows (assuming independence between x and y): :

E\left( \frac \right) = E\left( y \frac \right) = E( y )E\left( \frac \right) \ge E(y)\frac = \frac

Under simple random sampling the bias is of the order ''O''( ''n''⁻¹ ). An upper bound on the relative bias of the estimate is provided by the

coefficient of variation In probability theory and statistics, the coefficient of variation (CV), also known as relative standard deviation (RSD), is a standardized measure of dispersion of a probability distribution or frequency distribution. It is often expressed ...

(the ratio of the standard deviation to the

).Cochran WG (1977) Sampling techniques. New York: John Wiley & Sons Under simple random sampling the relative bias is ''O''( ''n''^−1/2 ).

Correction of the mean's bias

The correction methods, depending on the distributions of the ''x'' and ''y'' variates, differ in their efficiency making it difficult to recommend an overall best method. Because the estimates of ''r'' are biased a corrected version should be used in all subsequent calculations. A correction of the bias accurate to the first order is :

r_\mathrm = r - \frac

where ''m''_''x'' is the mean of the variate ''x'' and ''s''_''xy'' is the

covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the le ...

between ''x'' and ''y''. To simplify the notation ''s''_xy will be used subsequently to denote the covariance between the variates ''x'' and ''y''. Another estimator based on the

Taylor expansion In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor se ...

is :

r_\mathrm = r - ( 1 - \frac ) \frac

where ''n'' is the sample size, ''N'' is the population size, ''m''_''x'' is the mean of the ''x'' variate and ''s''_''x''² and ''s''_''y''² are the sample

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...

s of the ''x'' and ''y'' variates respectively. A computationally simpler but slightly less accurate version of this estimator is :

r_\mathrm = r - \frac \frac

where ''N'' is the population size, ''n'' is the sample size, ''m''_''x'' is the mean of the ''x'' variate and ''s''_''x''² and ''s''_''y''² are the sample

s of the ''x'' and ''y'' variates respectively. These versions differ only in the factor in the denominator (''N'' - 1). For a large ''N'' the difference is negligible. If ''x'' and ''y'' are unitless counts with

Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known ...

a second-order correction isOgliore RC, Huss GR, Nagashima K (2011) Ratio estimation in SIMS analysis. Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms 269 (17) 1910–1918 :

+ \frac \right) \right]

Other methods of bias correction have also been proposed. To simplify the notation the following variables will be used :

\theta = \frac - \frac

c_x^2 = \frac

c_ = \frac

Pascual's estimator:Pascual JN (1961) Unbiased ratio estimators in stratified sampling. JASA 56(293):70–87 :

r_\mathrm = r + \frac \frac

Beale's estimator:Beale EML (1962) Some use of computers in operational research. Industrielle Organization 31: 27-28 :

r_\mathrm = r \frac

Tin's estimator:Tin M (1965) Comparison of some ratio estimators. JASA 60: 294–307 :

r_\mathrm = r \left( 1 + \theta \left(  c_ - c_x^2 \right) \right)

Sahoo's estimator:Sahoo LN (1983). On a method of bias reduction in ratio estimation. J Statist Res 17:1—6 :

r_\mathrm = \frac

Sahoo has also proposed a number of additional estimators:Sahoo LN (1987) On a class of almost unbiased estimators for population ratio. Statistics 18: 119-121 :

r_\mathrm = r ( 1 + \theta c_ ) ( 1 - \theta c_x^2 )

r_\mathrm = \frac

r_\mathrm = \frac

If ''x'' and ''y'' are unitless counts with Poisson distribution and ''m''_''x'' and ''m''_''y'' are both greater than 10, then the following approximation is correct to order O( ''n''⁻³ ). :

r_\mathrm = r \left 1 - \frac \left( \frac - \frac \right) \left( 1 + \frac + \frac \right) \right /math>

An asymptotically correct estimator is van Kempen GMP, van Vliet LJ (2000) Mean and variance of ratio estimators used in fluorescence ratio imaging. Cytometry 39:300–305 : r_\mathrm = r + c_x^2 \frac - \frac

Jackknife estimation

A jackknife estimate of the ratio is less biased than the naive form. A jackknife estimator of the ratio is :

r_\mathrm = nr - \frac \sum_^n r_i

where ''n'' is the size of the sample and the ''r''_i are estimated with the omission of one pair of variates at a time.Choquet D, L'ecuyer P, Léger C (1999) Bootstrap confidence intervals for ratios of expectations. ACM Transactions on Modeling and Computer Simulation - TOMACS 9 (4) 326-348 An alternative method is to divide the sample into ''g'' groups each of size ''p'' with ''n'' = ''pg''.Durbin J (1959) A note on the application of Quenouille's method of bias reduction to estimation of ratios. Biometrika 46: 477-480 Let ''r''_i be the estimate of the ''i''^th group. Then the estimator :

r_\mathrm = gr - \frac \sum_^g r_i = g \left(r - \bar \right) + \bar

where

\bar

is the mean of the ratios ''r''_''g'' of the ''g'' groups, has a bias of at most ''O''( ''n''⁻² ). Other estimators based on the division of the sample into ''g'' groups are:Mickey MR (1959) Some finite population unbiased ratio and regression estimators. JASA 54: 596–612 :

r_\mathrm = \frac r - \frac \sum_^g r_i

r_\mathrm = \bar +\frac \frac

r_\mathrm = \bar + \frac

where

\bar

is the mean of the ratios ''r''_''g'' of the ''g'' groups and :

\bar = \sum \frac

where ''r''_''i''^' is the value of the sample ratio with the ''i''^th group omitted.

Other methods of estimation

Other methods of estimating a ratio estimator include

maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed sta ...

and

bootstrapping In general, bootstrapping usually refers to a self-starting process that is supposed to continue or grow without external input. Etymology Tall boots may have a tab, loop or handle at the top known as a bootstrap, allowing one to use fingers ...

Estimate of total

The estimated total of the ''y'' variate ( ''τ''_''y'' ) is :

\tau_y = r \tau_x

where ( ''τ''_''x'' ) is the total of the ''x'' variate.

Variance estimates

The variance of the sample ratio is approximately: :

\operatorname( r ) = \frac \left ( s_y^2 - s_ ) - ( s_ )^2 +2 m_y s_ - \frac( m_y - s_^2) \right

where ''s''_''x''² and ''s''_''y''² are the variances of the ''x'' and ''y'' variates respectively, ''m''_''x'' and ''m''_''y'' are the means of the ''x'' and ''y'' variates respectively and ''s''_''xy'' is the covariance of ''x'' and ''y''. Although the approximate variance estimator of the ratio given below is biased, if the sample size is large, the bias in this estimator is negligible. :

\operatorname( r ) = \frac \frac \frac \frac

where ''N'' is the population size, ''n'' is the sample size and ''m''_''x'' is the mean of the ''x'' variate. Another estimator of the variance based on the

is :

\operatorname( r ) = \frac ( 1 - \frac ) \frac

where ''n'' is the sample size and ''N'' is the population size and ''s''_''xy'' is the covariance of ''x'' and ''y''. An estimate accurate to O( ''n''⁻² ) is :

\operatorname( r ) = \frac\left \frac + \frac - \frac \right

If the probability distribution is Poissonian, an estimator accurate to O( ''n''⁻³ ) is :

+ \frac - \frac \right) \right]

A jackknife estimator of the variance is :

\operatorname( r ) = \frac \sum_^n ( r_i - r_J )^2

where ''r''_i is the ratio with the ''i''^th pair of variates omitted and ''r''_J is the jackknife estimate of the ratio.

Variance of total

The variance of the estimated total is :

\operatorname( \tau_y ) = \tau_y^2 \operatorname( r )

Variance of mean

The variance of the estimated mean of the ''y'' variate is :

\operatorname( \bar ) = m_x^2 \operatorname( r ) = \frac \frac

= \frac \frac

where ''m''_''x'' is the mean of the ''x'' variate, ''s''_''x''² and ''s''_''y''² are the sample variances of the ''x'' and ''y'' variates respectively and ''s''_''xy'' is the covariance of ''x'' and ''y''.

Skewness

The

skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimo ...

and the

kurtosis In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kur ...

of the ratio depend on the distributions of the ''x'' and ''y'' variates. Estimates have been made of these parameters for

normally distributed In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu is ...

''x'' and ''y'' variates but for other distributions no expressions have yet been derived. It has been found that in general ratio variables are skewed to the right, are leptokurtic and their nonnormality is increased when magnitude of the denominator's

is increased. For normally distributed ''x'' and ''y'' variates the skewness of the ratio is approximately :

\right)

where :

\omega = 1 - m_x \operatorname( x, y )

Effect on confidence intervals

Because the ratio estimate is generally skewed confidence intervals created with the variance and symmetrical tests such as the t test are incorrect. These confidence intervals tend to overestimate the size of the left confidence interval and underestimate the size of the right. If the ratio estimator is

unimodal In mathematics, unimodality means possessing a unique mode. More generally, unimodality means there is only a single highest value, somehow defined, of some mathematical object. Unimodal probability distribution In statistics, a unimodal pr ...

(which is frequently the case) then a conservative estimate of the 95% confidence intervals can be made with the Vysochanskiï–Petunin inequality.

Alternative methods of bias reduction

An alternative method of reducing or eliminating the bias in the ratio estimator is to alter the method of sampling. The variance of the ratio using these methods differs from the estimates given previously. Note that while many applications such as those discussion in Lohr are intended to be restricted to positive ''integers'' only, such as sizes of sample groups, the Midzuno-Sen method works for any sequence of positive numbers, integral or not. It's not clear what it means that Lahiri's method ''works'' since it returns a biased result.

Lahiri's method

The first of these sampling schemes is a double use of a sampling method introduced by Lahiri in 1951.Lahiri DB (1951) A method of sample selection providing unbiased ratio estimates. Bull Int Stat Inst 33: 133–140 The algorithm here is based upon the description by Lohr. Lohr S (2010) ''Sampling - Design and Analysis'' (2nd edition) # Choose a number ''M'' = max( ''x''₁, ..., ''x''_N) where ''N'' is the population size. # Choose ''i'' at random from a

uniform distribution Uniform distribution may refer to: * Continuous uniform distribution * Discrete uniform distribution * Uniform distribution (ecology) * Equidistributed sequence In mathematics, a sequence (''s''1, ''s''2, ''s''3, ...) of real numbers is said to be ...

on ,''N'' # Choose ''k'' at random from a

on ,''M'' # If ''k'' ≤ ''x''_i, then ''x''_i is retained in the sample. If not then it is rejected. # Repeat this process from step 2 until the desired sample size is obtained. The same procedure for the same desired sample size is carried out with the ''y'' variate. Lahiri's scheme as described by Lohr is ''biased high'' and, so, is interesting only for historical reasons. The Midzuno-Sen technique described below is recommended instead.

Midzuno-Sen's method

In 1952 Midzuno and Sen independently described a sampling scheme that provides an unbiased estimator of the ratio.Midzuno H (1952) On the sampling system with probability proportional to the sum of the sizes. Ann Inst Stat Math 3: 99-107Sen AR (1952) Present status of probability sampling and its use in the estimation of a characteristic. Econometrika 20-103 The first sample is chosen with probability proportional to the size of the ''x'' variate. The remaining ''n'' - 1 samples are chosen at random without replacement from the remaining ''N'' - 1 members in the population. The probability of selection under this scheme is :

P = \frac

where ''X'' is the sum of the ''N'' ''x'' variates and the ''x''_i are the ''n'' members of the sample. Then the ratio of the sum of the ''y'' variates and the sum of the ''x'' variates chosen in this fashion is an unbiased estimate of the ratio estimator. In symbols we have :

r = \frac

where ''x''_i and ''y''_i are chosen according to the scheme described above. The ratio estimator given by this scheme is unbiased. Särndal, Swensson, and Wretman credit Lahiri, Midzuno and Sen for the insights leading to this methodSärndal, C-E, B Swensson J Wretman (1992) Model assisted survey sampling. Springer, §7.3.1 (iii) but Lahiri's technique is biased high.

Other ratio estimators

Tin (1965) described and compared ratio estimators proposed by Beale (1962) and Quenouille (1956) and proposed a modified approach (now referred to as Tin's method). These ratio estimators are commonly used to calculate pollutant loads from sampling of waterways, particularly where flow is measured more frequently than water quality. For example see Quilbe et al., (2006)Quilbé, R., Rousseau, A. N., Duchemin, M., Poulin, A., Gangbazo, G., & Villeneuve, J. P. (2006). Selecting a calculation method to estimate sediment and nutrient loads in streams: Application to the Beaurivage River (Québec, Canada). Journal of Hydrology, 326(1–4), 295–310. https://doi.org/10.1016/j.jhydrol.2005.11.008

Ordinary least squares regression

If a linear relationship between the ''x'' and ''y'' variates exists and the

regression Regression or regressions may refer to: Science * Marine regression, coastal advance due to falling sea level, the opposite of marine transgression * Regression (medicine), a characteristic of diseases to express lighter symptoms or less extent ( ...

equation passes through the origin then the estimated variance of the regression equation is always less than that of the ratio estimator. The precise relationship between the variances depends on the linearity of the relationship between the ''x'' and ''y'' variates: when the relationship is other than linear the ratio estimate may have a lower variance than that estimated by regression.

Uses

Although the ratio estimator may be of use in a number of settings it is of particular use in two cases: * when the variates ''x'' and ''y'' are highly

correlate In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statisti ...

d through the

origin Origin(s) or The Origin may refer to: Arts, entertainment, and media Comics and manga * ''Origin'' (comics), a Wolverine comic book mini-series published by Marvel Comics in 2002 * ''The Origin'' (Buffy comic), a 1999 ''Buffy the Vampire Sl ...

. * In

survey methodology Survey methodology is "the study of survey methods". As a field of applied statistics concentrating on human-research surveys, survey methodology studies the sampling of individual units from a population and associated techniques of survey ...

when estimating a

weighted average The weighted arithmetic mean is similar to an ordinary arithmetic mean (the most common type of average), except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The ...

in which the denominator indicates the sum of weights that reflect the total population size, but the total population size is unknown.

History

The first known use of the ratio estimator was by John Graunt in

England England is a country that is part of the United Kingdom. It shares land borders with Wales to its west and Scotland to its north. The Irish Sea lies northwest and the Celtic Sea to the southwest. It is separated from continental Europe ...

who in 1662 was the first to estimate the ratio ''y''/''x'' where ''y'' represented the total population and ''x'' the known total number of registered births in the same areas during the preceding year. Later Messance (~1765) and Moheau (1778) published very carefully prepared estimates for

France France (), officially the French Republic ( ), is a country primarily located in Western Europe. It also comprises of overseas regions and territories in the Americas and the Atlantic, Pacific and Indian Oceans. Its metropolitan ar ...

based on enumeration of population in certain districts and on the count of births, deaths and marriages as reported for the whole country. The districts from which the ratio of inhabitants to birth was determined only constituted a sample. In 1802,

Laplace Pierre-Simon, marquis de Laplace (; ; 23 March 1749 – 5 March 1827) was a French scholar and polymath whose work was important to the development of engineering, mathematics, statistics, physics, astronomy, and philosophy. He summariz ...

wished to estimate the population of France. No population census had been carried out and Laplace lacked the resources to count every individual. Instead he sampled 30

parish A parish is a territorial entity in many Christian denominations, constituting a division within a diocese. A parish is under the pastoral care and clerical jurisdiction of a priest, often termed a parish priest, who might be assisted by one or ...

es whose total number of inhabitants was 2,037,615. The parish baptismal registrations were considered to be reliable estimates of the number of live births so he used the total number of births over a three-year period. The sample estimate was 71,866.333 baptisms per year over this period giving a ratio of one registered baptism for every 28.35 persons. The total number of baptismal registrations for France was also available to him and he assumed that the ratio of live births to population was constant. He then used the ratio from his sample to estimate the population of France.

Karl Pearson Karl Pearson (; born Carl Pearson; 27 March 1857 – 27 April 1936) was an English mathematician and biostatistician. He has been credited with establishing the discipline of mathematical statistics. He founded the world's first university st ...

said in 1897 that the ratio estimates are biased and cautioned against their use.Pearson K (1897) On a form of spurious correlation that may arise when indices are used for the measurement of organs. Proc Roy Soc Lond 60: 498

References

{{Statistics, descriptive, state=collapsed Statistical deviation and dispersion Articles containing proofs Statistical ratios