statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, Deming regression, named after

W. Edwards Deming William Edwards Deming (October 14, 1900 – December 20, 1993) was an American business theorist, composer, economist, industrial engineer, management consultant, statistician, and writer. Educated initially as an electrical engineer and later ...

, is an

errors-in-variables model In statistics, an errors-in-variables model or a measurement error model is a regression model that accounts for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been m ...

that tries to find the line of best fit for a two-dimensional data set. It differs from the

simple linear regression In statistics, simple linear regression (SLR) is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the ''x ...

in that it accounts for errors in observations on both the ''x''- and the ''y''- axis. It is a special case of

total least squares In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generaliz ...

, which allows for any number of predictors and a more complicated error structure. Deming regression is equivalent to the

maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...

estimation of an

in which the errors for the two variables are assumed to be independent and

normally distributed In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x ...

, and the ratio of their variances, denoted ''δ'', is known. In practice, this ratio might be estimated from related data-sources; however the regression procedure takes no account for possible errors in estimating this ratio. The Deming regression is only slightly more difficult to compute than the

. Most statistical software packages used in clinical chemistry offer Deming regression. The model was originally introduced by who considered the case ''δ'' = 1, and then more generally by with arbitrary ''δ''. However their ideas remained largely unnoticed for more than 50 years, until they were revived by and later propagated even more by . The latter book became so popular in clinical chemistry and related fields that the method was even dubbed ''Deming regression'' in those fields.

Specification

Assume that the available data (''y_i'', ''x_i'') are measured observations of the "true" values (''y_i*'', ''x_i*''), which lie on the regression line: :

\begin
  y_i &= y^*_i + \varepsilon_i, \\
  x_i &= x^*_i + \eta_i,
  \end

where errors ''ε'' and ''η'' are independent and the ratio of their variances is assumed to be known: :

\delta = \frac.

In practice, the variances of the

x

and

y

parameters are often unknown, which complicates the estimate of

\delta

. Note that when the measurement method for

x

and

y

is the same, these variances are likely to be equal, so

\delta = 1

for this case. We seek to find the line of "best fit" :

y^* = \beta_0 + \beta_1 x^*,

such that the weighted sum of squared residuals of the model is minimized: :

SSR = \sum_^n\bigg(\frac + \frac\bigg) = \frac \sum_^n\Big((y_i-\beta_0-\beta_1x^*_i)^2 + \delta(x_i-x^*_i)^2\Big) \ \to\ \min_ SSR

See for a full derivation.

Solution

The solution can be expressed in terms of the second-degree sample moments. That is, we first calculate the following quantities (all sums go from ''i'' = 1 to ''n''): :

\begin
\overline &= \tfrac\sum x_i & \overline &= \tfrac\sum y_i,\\
s_  &= \tfrac\sum (x_i-\overline)^2   &&= \overline - \overline^2, \\
s_  &= \tfrac\sum (x_i-\overline)(y_i-\overline)   &&= \overline - \overline \, \overline, \\
s_  &= \tfrac\sum (y_i-\overline)^2   &&= \overline - \overline^2.
\end\,

Finally, the least-squares estimates of model's parameters will be :

\begin
  & \hat\beta_1 = \frac, \\
  & \hat\beta_0 = \overline - \hat\beta_1\overline, \\
  & \hat_i^* = x_i + \frac(y_i-\hat\beta_0-\hat\beta_1x_i).
  \end

Orthogonal regression

For the case of equal error variances, i.e., when

\delta=1

, Deming regression becomes orthogonal regression: it minimizes the sum of squared perpendicular distances from the data points to the regression line. In this case, denote each observation as a point

z_j = x_j +i y_j

in the complex plane (i.e., the point

(x_j, y_j)

where

i

is the

imaginary unit The imaginary unit or unit imaginary number () is a mathematical constant that is a solution to the quadratic equation Although there is no real number with this property, can be used to extend the real numbers to what are called complex num ...

). Denote as

S=\sum

the sum of the squared differences of the data points from the

centroid In mathematics and physics, the centroid, also known as geometric center or center of figure, of a plane figure or solid figure is the arithmetic mean position of all the points in the figure. The same definition extends to any object in n-d ...

\overline z = \tfrac \sum z_j

(also denoted in complex coordinates), which is the point whose horizontal and vertical locations are the averages of those of the data points. Then: *If

S=0

, then every line through the centroid is a line of best orthogonal fit. *If

S \neq 0

, the orthogonal regression line goes through the centroid and is parallel to the vector from the origin to

\sqrt

. A

trigonometric Trigonometry () is a branch of mathematics concerned with relationships between angles and side lengths of triangles. In particular, the trigonometric functions relate the angles of a right triangle with ratios of its side lengths. The field ...

representation of the orthogonal regression line was given by Coolidge in 1913.

Application

In the case of three non-collinear points in the plane, the

triangle A triangle is a polygon with three corners and three sides, one of the basic shapes in geometry. The corners, also called ''vertices'', are zero-dimensional points while the sides connecting them, also called ''edges'', are one-dimension ...

with these points as its vertices has a unique Steiner inellipse that is tangent to the triangle's sides at their midpoints. The major axis of this ellipse falls on the orthogonal regression line for the three vertices. The quantification of a biological cell's intrinsic

cellular noise Cellular may refer to: *Cellular automaton, a model in discrete mathematics *Cell biology, the evaluation of cells work and more * ''Cellular'' (film), a 2004 movie *Cellular frequencies, assigned to networks operating in cellular RF bands *Cellu ...

can be quantified upon applying Deming regression to the observed behavior of a two reporter

synthetic biological circuit Synthetic biological circuits are an application of synthetic biology where biological parts inside a Cell (biology), cell are designed to perform logical functions mimicking those observed in electronic circuits. Typically, these circuits are ca ...

. When humans are asked to draw a linear regression on a scatterplot by guessing, their answers are closer to orthogonal regression than to ordinary least squares regression.

York regression

The York regression extends Deming regression by allowing correlated errors in x and y.York, D., Evensen, N. M., Martınez, M. L., and Delgado, J. D. B.: Unified equations for the slope, intercept, and standard errors of the best straight line, Am. J. Phys., 72, 367–375, https://doi.org/10.1119/1.1632486, 2004.

References