Slope correction
Regression slope and other regression coefficients can be disattenuated as follows.The case of a fixed ''x'' variable
The case that ''x'' is fixed, but measured with noise, is known as the ''functional model'' or ''functional relationship''. It can be corrected using total least squares and errors-in-variables models in general.The case of a randomly distributed ''x'' variable
The case that the ''x'' variable arises randomly is known as the ''structural model'' or ''structural relationship''. For example, in a medical study patients are recruited as a sample from a population, and their characteristics such as blood pressure may be viewed as arising from a random sample. Under certain assumptions (typically, normal distribution assumptions) there is a knownMultiple ''x'' variables
The case of multiple predictor variables subject to variability (possibly correlated) has been well-studied for linear regression, and for some non-linear regression models. Other non-linear models, such as proportional hazards models for survival analysis, have been considered only with a single predictor subject to variability.Correlation correction
Charles Spearman developed in 1904 a procedure for correcting correlations for regression dilution, i.e., to "rid a correlation coefficient from the weakening effect of measurement error". In measurement andFormulation
Let and be the true values of two attributes of some person or statistical unit. These values are variables by virtue of the assumption that they differ for different statistical units in the population. Let and be estimates of and derived either directly by observation-with-error or from application of a measurement model, such as the Rasch model. Also, let :: where and are the measurement errors associated with the estimates and . The estimated correlation between two sets of estimates is : ::::: which, assuming the errors are uncorrelated with each other and with the true attribute values, gives : ::::: ::::: where is the ''separation index'' of the set of estimates of , which is analogous to Cronbach's alpha; that is, in terms of classical test theory, is analogous to a reliability coefficient. Specifically, the separation index is given as follows: : where the mean squared standard error of person estimate gives an estimate of the variance of the errors, . The standard errors are normally produced as a by-product of the estimation process (see Rasch model estimation). The disattenuated estimate of the correlation between the two sets of parameter estimates is therefore : That is, the disattenuated correlation estimate is obtained by dividing the correlation between the estimates by the geometric mean of the separation indices of the two sets of estimates. Expressed in terms of classical test theory, the correlation is divided by the geometric mean of the reliability coefficients of two tests. Given two random variables and measured as and with measured correlation and a known reliability for each variable, and , the estimated correlation between and corrected for attenuation is :. How well the variables are measured affects the correlation of ''X'' and ''Y''. The correction for attenuation tells one what the estimated correlation is expected to be if one could measure ''X′'' and ''Y′'' with perfect reliability. Thus if and are taken to be imperfect measurements of underlying variables and with independent errors, then estimates the true correlation between and .Applicability
A correction for regression dilution is necessary in statistical inference based on regression coefficients. However, in predictive modelling applications, correction is neither necessary nor appropriate. In change detection, correction is necessary. To understand this, consider the measurement error as follows. Let ''y'' be the outcome variable, ''x'' be the true predictor variable, and ''w'' be an approximate observation of ''x''. Frost and Thompson suggest, for example, that ''x'' may be the true, long-term blood pressure of a patient, and ''w'' may be the blood pressure observed on one particular clinic visit. Regression dilution arises if we are interested in the relationship between ''y'' and ''x'', but estimate the relationship between ''y'' and ''w''. Because ''w'' is measured with variability, the slope of a regression line of ''y'' on ''w'' is less than the regression line of ''y'' on ''x''. Standard methods can fit a regression of y on w without bias. There is bias only if we then use the regression of y on w as an approximation to the regression of y on x. In the example, assuming that blood pressure measurements are similarly variable in future patients, our regression line of y on w (observed blood pressure) gives unbiased predictions. An example of a circumstance in which correction is desired is prediction of change. Suppose the change in ''x'' is known under some new circumstance: to estimate the likely change in an outcome variable ''y'', the slope of the regression of ''y'' on ''x'' is needed, not ''y'' on ''w''. This arises in epidemiology. To continue the example in which ''x'' denotes blood pressure, perhaps a large clinical trial has provided an estimate of the change in blood pressure under a new treatment; then the possible effect on ''y'', under the new treatment, should be estimated from the slope in the regression of ''y'' on ''x''. Another circumstance is predictive modelling in which future observations are also variable, but not (in the phrase used above) "similarly variable". For example, if the current data set includes blood pressure measured with greater precision than is common in clinical practice. One specific example of this arose when developing a regression equation based on a clinical trial, in which blood pressure was the average of six measurements, for use in clinical practice, where blood pressure is usually a single measurement. All of these results can be shown mathematically, in the case of simple linear regression assuming normal distributions throughout (the framework of Frost & Thompson). It has been discussed that a poorly executed correction for regression dilution, in particular when performed without checking for the underlying assumptions, may do more damage to an estimate than no correction.Further reading
Regression dilution was first mentioned, under the name attenuation, by Spearman (1904). Those seeking a readable mathematical treatment might like to start with Frost and Thompson (2000).See also
* Errors-in-variables models * Quantization (signal processing) – a common source of error in the explanatory or independent variablesReferences