applied statistics Statistics (from German: ', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or socia ...

, total least squares is a type of errors-in-variables regression, a

least squares The method of least squares is a mathematical optimization technique that aims to determine the best fit function by minimizing the sum of the squares of the differences between the observed values and the predicted values of the model. The me ...

data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generalization of

Deming regression In statistics, Deming regression, named after W. Edwards Deming, is an errors-in-variables model that tries to find the line of best fit for a two-dimensional data set. It differs from the simple linear regression in that it accounts for errors ...

and also of

orthogonal regression In statistics, Deming regression, named after W. Edwards Deming, is an errors-in-variables model that tries to find the line of best fit for a two-dimensional data set. It differs from the simple linear regression in that it accounts for errors ...

, and can be applied to both linear and non-linear models. The total least squares approximation of the data is generically equivalent to the best, in the

Frobenius norm In the field of mathematics, norms are defined for elements within a vector space. Specifically, when the vector space comprises matrices, such norms are referred to as matrix norms. Matrix norms differ from vector norms in that they must also ...

low-rank approximation In mathematics, low-rank approximation refers to the process of approximating a given matrix by a matrix of lower rank. More precisely, it is a minimization problem, in which the cost function measures the fit between a given matrix (the data) an ...

of the data matrix.

Linear model

Background

In the

method of data modeling, the

objective function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...

to be minimized, ''S'', is a

quadratic form In mathematics, a quadratic form is a polynomial with terms all of degree two (" form" is another name for a homogeneous polynomial). For example, 4x^2 + 2xy - 3y^2 is a quadratic form in the variables and . The coefficients usually belong t ...

: :

S=\mathbf,

where ''r'' is the vector of residuals and ''W'' is a weighting matrix. In

linear least squares Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and ...

the model contains equations which are linear in the parameters appearing in the parameter vector

\boldsymbol\beta

, so the residuals are given by :

\mathbf.

There are ''m'' observations in y and ''n'' parameters in β with ''m''>''n''. X is a ''m''×''n'' matrix whose elements are either constants or functions of the independent variables, x. The weight matrix W is, ideally, the inverse of the

variance-covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of a ...

\mathbf M_y

of the observations y. The independent variables are assumed to be error-free. The parameter estimates are found by setting the gradient equations to zero, which results in the normal equations An alternative form is

\mathbf

, where

\boldsymbol\Delta \boldsymbol\beta

is the parameter shift from some starting estimate of

\boldsymbol\beta

and

\boldsymbol\Delta \mathbf y

is the difference between y and the value calculated using the starting value of

\boldsymbol\beta

\mathbf.

Allowing observation errors in all variables

Now, suppose that both x and y are observed subject to error, with variance-covariance matrices

\mathbf M_x

and

\mathbf M_y

respectively. In this case the objective function can be written as :

S=\mathbf,

where

\mathbf r_x

and

\mathbf r_y

are the residuals in x and y respectively. Clearly these residuals cannot be independent of each other, but they must be constrained by some kind of relationship. Writing the model function as

\mathbf

, the constraints are expressed by ''m'' condition equations. :

\mathbf.

Thus, the problem is to minimize the objective function subject to the ''m'' constraints. It is solved by the use of

Lagrange multipliers In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equation constraints (i.e., subject to the condition that one or more equations have to be satisfie ...

. After some algebraic manipulations, the result is obtained. :

\mathbf,

or alternatively

\mathbf,

where M is the variance-covariance matrix relative to both independent and dependent variables. :

\mathbf.

Example

When the data errors are uncorrelated, all matrices M and W are diagonal. Then, take the example of straight line fitting. :

f(x_i,\beta)=\alpha + \beta x_i

in this case :

M_=\sigma^2_+\beta^2 \sigma^2_

showing how the variance at the ''i''th point is determined by the variances of both independent and dependent variables and by the model being used to fit the data. The expression may be generalized by noting that the parameter

\beta

is the slope of the line. :

M_=\sigma^2_+\left(\frac\right)^2_i \sigma^2_

An expression of this type is used in fitting pH titration data where a small error on ''x'' translates to a large error on y when the slope is large.

Algebraic point of view

As was shown in 1980 by Golub and Van Loan, the TLS problem does not have a solution in general. The following considers the simple case where a unique solution exists without making any particular assumptions. The computation of the TLS using

singular value decomposition In linear algebra, the singular value decomposition (SVD) is a Matrix decomposition, factorization of a real number, real or complex number, complex matrix (mathematics), matrix into a rotation, followed by a rescaling followed by another rota ...

(SVD) is described in standard texts. We can solve the equation :

XB \approx Y

for ''B'' where ''X'' is ''m''-by-''n'' and ''Y'' is ''m''-by-''k''. The notation ''XB'' ≈ ''Y'' is used here to reflect the notation used in the earlier part of the article. In the computational literature the problem has been more commonly presented as ''AX'' ≈ ''B'', i.e. with the letter ''X'' used for the ''n''-by-''k'' matrix of unknown regression coefficients. That is, we seek to find ''B'' that minimizes error matrices ''E'' and ''F'' for ''X'' and ''Y'' respectively. That is, :

\, _F, \qquad (X+E) B = Y+F

where

\; F /math> is the

augmented matrix In linear algebra, an augmented matrix (A \vert B) is a k \times (n+1) matrix obtained by appending a k-dimensional column vector B, on the right, as a further column to a k \times n-dimensional matrix A. This is usually done for the purpose of p ...

with ''E'' and ''F'' side by side and

\, \cdot\, _F

is the

, the square root of the sum of the squares of all entries in a matrix and so equivalently the square root of the sum of squares of the lengths of the rows or columns of the matrix. This can be rewritten as :

\begin B\\ -I_k\end = 0.

where

I_k

is the

k\times k

identity matrix. The goal is then to find

\; F /math> that reduces the rank of \; Y /math> by ''k''. Define

Sigma Sigma ( ; uppercase Σ, lowercase σ, lowercase in word-final position ς; ) is the eighteenth letter of the Greek alphabet. In the system of Greek numerals, it has a value of 200. In general mathematics, uppercase Σ is used as an operator ...

* to be the singular value decomposition of the augmented matrix

\; Y /math>.
: \; Y =_X\; U_Y \begin\Sigma_X &0 \\ 0 & \Sigma_Y\end\beginV_ & V_ \\ V_ & V_\end^* =_X\; U_Y \begin\Sigma_X &0 \\ 0 & \Sigma_Y\end \begin V_^* & V_^* \\ V_^* & V_^*\end where ''V'' is partitioned into blocks corresponding to the shape of ''X'' and ''Y''.

Using the Eckart–Young theorem, the approximation minimising the norm of the error is such that matrices U and V are unchanged, while the smallest k singular values are replaced with zeroes. That is, we want
: X+E)\; (Y+F) =_X\; U_Y \begin\Sigma_X &0 \\ 0 & 0_\end\beginV_ & V_ \\ V_ & V_\end^* so by linearity,
: \; F = -_X\; U_Y \begin0_ &0 \\ 0 & \Sigma_Y\end\beginV_ & V_ \\ V_ & V_\end^*. We can then remove blocks from the ''U'' and Σ matrices, simplifying to
: \; F = -U_Y\Sigma_Y \beginV_\\V_\end^*= - \; Y \beginV_\\V_\end\beginV_\\ V_\end^*. This provides ''E'' and ''F'' so that 
: X+E) \; (Y+F) \beginV_\\ V_\end = 0. Now if V_is nonsingular, which is not always the case (note that the behavior of TLS when V_is singular is not well understood yet), we can then right multiply both sides by -V_^to bring the bottom block of the right matrix to the negative identity, giving
: X+E) \; (Y+F) \begin -V_ V_^ \\ -V_ V_^\end = X+E) \; (Y+F) \begin B\\ -I_k\end =  0 , and so
: B=-V_ V_^. A naive

GNU Octave GNU Octave is a scientific programming language for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly ...

implementation of this is: function B = tls(X, Y) n = size(X); % n is the width of X (X is m by n) Z = Y % Z is X augmented with Y. S V= svd(Z, 0); % find the SVD of Z. VXY = V(1:n, 1+n:end); % Take the block of V consisting of the first n rows and the n+1 to last column VYY = V(1+n:end, 1+n:end); % Take the bottom-right block of V. B = -VXY / VYY; end The way described above of solving the problem, which requires that the matrix

V_

is nonsingular, can be slightly extended by the so-called ''classical TLS algorithm''.

Computation

The standard implementation of classical TLS algorithm is available throug
Netlib
see also. All modern implementations based, for example, on solving a sequence of ordinary least squares problems, approximate the matrix

B

(denoted

X

in the literature), as introduced by Van Huffel and Vandewalle. It is worth noting, that this

B

is, however, ''not the TLS solution'' in many cases.

Non-linear model

For

non-linear systems Non-Linear Systems is an electronics manufacturing company based in San Diego, California. Non-Linear Systems was founded in 1952, by Andrew Kay, the inventor of the digital voltmeter in 1954.Jacobian matrix In vector calculus, the Jacobian matrix (, ) of a vector-valued function of several variables is the matrix of all its first-order partial derivatives. If this matrix is square, that is, if the number of variables equals the number of component ...

Geometrical interpretation

When the independent variable is error-free a residual represents the "vertical" distance between the observed data point and the fitted curve (or surface). In total least squares a residual represents the distance between a data point and the fitted curve measured along some direction. In fact, if both variables are measured in the same units and the errors on both variables are the same, then the residual represents the shortest distance between the data point and the fitted curve, that is, the residual vector is perpendicular to the tangent of the curve. For this reason, this type of regression is sometimes called ''two dimensional Euclidean regression'' (Stein, 1983) or ''orthogonal regression''.

Scale invariant methods

A serious difficulty arises if the variables are not measured in the same units. First consider measuring distance between a data point and the line: what are the measurement units for this distance? If we consider measuring distance based on Pythagoras' Theorem then it is clear that we shall be adding quantities measured in different units, which is meaningless. Secondly, if we rescale one of the variables e.g., measure in grams rather than kilograms, then we shall end up with different results (a different line). To avoid these problems it is sometimes suggested that we convert to dimensionless variables—this may be called normalization or standardization. However, there are various ways of doing this, and these lead to fitted models which are not equivalent to each other. One approach is to normalize by known (or estimated) measurement precision thereby minimizing the

Mahalanobis distance The Mahalanobis distance is a distance measure, measure of the distance between a point P and a probability distribution D, introduced by Prasanta Chandra Mahalanobis, P. C. Mahalanobis in 1936. The mathematical details of Mahalanobis distance ...

from the points to the line, providing a

maximum-likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...

solution; the unknown precisions could be found via

analysis of variance Analysis of variance (ANOVA) is a family of statistical methods used to compare the Mean, means of two or more groups by analyzing variance. Specifically, ANOVA compares the amount of variation ''between'' the group means to the amount of variati ...

. In short, total least squares does not have the property of units-invariance—i.e. it is not

scale invariant In physics, mathematics and statistics, scale invariance is a feature of objects or laws that do not change if scales of length, energy, or other variables, are multiplied by a common factor, and thus represent a universality. The technical term ...

. For a meaningful model we require this property to hold. A way forward is to realise that residuals (distances) measured in different units can be combined if multiplication is used instead of addition. Consider fitting a line: for each data point the product of the vertical and horizontal residuals equals twice the area of the triangle formed by the residual lines and the fitted line. We choose the line which minimizes the sum of these areas. Nobel laureate

Paul Samuelson Paul Anthony Samuelson (May 15, 1915 – December 13, 2009) was an American economist who was the first American to win the Nobel Memorial Prize in Economic Sciences. When awarding the prize in 1970, the Swedish Royal Academies stated that he "h ...

proved in 1942 that, in two dimensions, it is the only line expressible solely in terms of the ratios of standard deviations and the correlation coefficient which (1) fits the correct equation when the observations fall on a straight line, (2) exhibits scale invariance, and (3) exhibits invariance under interchange of variables. This solution has been rediscovered in different disciplines and is variously known as standardised major axis (Ricker 1975, Warton et al., 2006), the reduced major axis, the geometric mean functional relationship (Draper and Smith, 1998), least products regression, diagonal regression, line of organic correlation, and the least areas line (Tofallis, 2002). Tofallis (2015, 2023)Tofallis, C. (2023). Fitting an Equation to Data Impartially. Mathematics, 11(18), 3957. https://ssrn.com/abstract=4556739 https://doi.org/10.3390/math11183957 has extended this approach to deal with multiple variables. The calculations are simpler than for total least squares as they only require knowledge of covariances, and can be computed using standard spreadsheet functions.

Notes

References

Others

* I. Hnětynková, M. Plešinger, D. M. Sima, Z. Strakoš, and S. Van Huffel, ''The total least squares problem in AX ≈ B. A new classification with the relationship to the classical works.'' SIMAX vol. 32 issue 3 (2011), pp. 748–770. Available as
preprint
* M. Plešinger, ''The Total Least Squares Problem and Reduction of Data in AX ≈ B.'' Doctoral Thesis, TU of Liberec and Institute of Computer Science, AS CR Prague, 2008
Ph.D. Thesis
* C. C. Paige, Z. Strakoš, ''Core problems in linear algebraic systems.'' SIAM J. Matrix Anal. Appl. 27, 2006, pp. 861–875. * S. Van Huffel and P. Lemmerling, ''Total Least Squares and Errors-in-Variables Modeling: Analysis, Algorithms and Applications''. Dordrecht, The Netherlands: Kluwer Academic Publishers, 2002. * S. Jo and S. W. Kim, ''Consistent normalized least mean square filtering with noisy data matrix.'' IEEE Trans. Signal Process., vol. 53, no. 6, pp. 2112–2123, Jun. 2005. * R. D. DeGroat and E. M. Dowling, ''The data least squares problem and channel equalization.'' IEEE Trans. Signal Process., vol. 41, no. 1, pp. 407–411, Jan. 1993. * S. Van Huffel and J. Vandewalle, ''The Total Least Squares Problems: Computational Aspects and Analysis.'' SIAM Publications, Philadelphia PA, 1991. * T. Abatzoglou and J. Mendel, ''Constrained total least squares'', in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP’87), Apr. 1987, vol. 12, pp. 1485–1488. * P. de Groen ''An introduction to total least squares'', in Nieuw Archief voor Wiskunde, Vierde serie, deel 14, 1996, pp. 237–25
arxiv.org
* G. H. Golub and C. F. Van Loan, ''An analysis of the total least squares problem.'' SIAM J. on Numer. Anal., 17, 1980, pp. 883–893.

at MathPages * A. R. Amiri-Simkooei and S. Jazaeri ''Weighted total least squares formulated by standard least squares theory'', in Journal of Geodetic Science, 2 (2): 113–124, 201

{{DEFAULTSORT:Total Least Squares Applied mathematics Curve fitting Least squares Regression models

Linear model

Background

Allowing observation errors in all variables

Example

Algebraic point of view

Computation

Non-linear model

Geometrical interpretation

Scale invariant methods

See also

Notes

References

Others