The Kabsch algorithm, also known as the Kabsch-Umeyama algorithm, named after Wolfgang Kabsch and Shinji Umeyama, is a method for calculating the optimal

rotation matrix In linear algebra, a rotation matrix is a transformation matrix that is used to perform a rotation (mathematics), rotation in Euclidean space. For example, using the convention below, the matrix :R = \begin \cos \theta & -\sin \theta \\ \sin \t ...

that minimizes the RMSD (

root mean square In mathematics, the root mean square (abbrev. RMS, or rms) of a set of values is the square root of the set's mean square. Given a set x_i, its RMS is denoted as either x_\mathrm or \mathrm_x. The RMS is also known as the quadratic mean (denote ...

d deviation) between two paired sets of points. It is useful for point-set registration in

computer graphics Computer graphics deals with generating images and art with the aid of computers. Computer graphics is a core technology in digital photography, film, video games, digital art, cell phone and computer displays, and many specialized applications. ...

, and in

cheminformatics Cheminformatics (also known as chemoinformatics) refers to the use of physical chemistry theory with computer and information science techniques—so called "'' in silico''" techniques—in application to a range of descriptive and prescriptive ...

and

bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...

to compare molecular and

protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...

structures (in particular, see root-mean-square deviation (bioinformatics)). The algorithm only computes the rotation matrix, but it also requires the computation of a translation vector. When both the translation and rotation are actually performed, the algorithm is sometimes called partial Procrustes superimposition (see also orthogonal Procrustes problem).

Description

Let and be two sets, each containing points in

\mathbb^n

. We want to find the transformation from to . For simplicity, we will consider the three-dimensional case (

n = 3

). The sets and can each be represented by

matrices Matrix (: matrices or matrixes) or MATRIX may refer to: Science and mathematics * Matrix (mathematics), a rectangular array of numbers, symbols or expressions * Matrix (logic), part of a formula in prenex normal form * Matrix (biology), the ...

with the first row containing the coordinates of the first point, the second row containing the coordinates of the second point, and so on, as shown in this matrix:

\begin
x_1 & y_1 & z_1 \\
x_2 & y_2 & z_2 \\
\vdots & \vdots & \vdots \\
x_N & y_N & z_N \end

The algorithm works in three steps: a translation, the computation of a covariance matrix, and the computation of the optimal rotation matrix.

Translation

Both sets of coordinates must be translated first, so that their

centroid In mathematics and physics, the centroid, also known as geometric center or center of figure, of a plane figure or solid figure is the arithmetic mean position of all the points in the figure. The same definition extends to any object in n-d ...

coincides with the origin of the

coordinate system In geometry, a coordinate system is a system that uses one or more numbers, or coordinates, to uniquely determine and standardize the position of the points or other geometric elements on a manifold such as Euclidean space. The coordinates are ...

. This is done by subtracting the centroid coordinates from the point coordinates.

Computation of the covariance matrix

The second step consists of calculating a matrix . In matrix notation, :

H = P^\mathsfQ \,

or, using summation notation, :

H_ = \sum_^N P_ Q_,

which is a cross-covariance matrix when and are seen as data matrices.

Computation of the optimal rotation matrix

It is possible to calculate the optimal rotation based on the matrix formula :

R = \left(H^\mathsf H\right)^\frac12 H^,

but implementing a numerical solution to this formula becomes complicated when all special cases are accounted for (for example, the case of not having an inverse). If

singular value decomposition In linear algebra, the singular value decomposition (SVD) is a Matrix decomposition, factorization of a real number, real or complex number, complex matrix (mathematics), matrix into a rotation, followed by a rescaling followed by another rota ...

(SVD) routines are available the optimal rotation, , can be calculated using the following algorithm. First, calculate the SVD of the covariance matrix , :

H = U \Sigma V^\mathsf

where and are orthogonal and

\Sigma

is diagonal. Next, record if the orthogonal matrices contain a reflection, :

d = \det\left(U V^\mathsf\right) = \det(U) \det(V).

Finally, calculate our optimal rotation matrix as :

R = U \begin
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & d \end V^\mathsf.

This minimizes

\sum_^N, R q_k - p_k,

, where

q_k

and

p_k

are rows in and respectively. Alternatively, optimal rotation matrix can also be directly evaluated as

quaternion In mathematics, the quaternion number system extends the complex numbers. Quaternions were first described by the Irish mathematician William Rowan Hamilton in 1843 and applied to mechanics in three-dimensional space. The algebra of quater ...

. This alternative description has been used in the development of a rigorous method for removing rigid-body motions from

molecular dynamics Molecular dynamics (MD) is a computer simulation method for analyzing the Motion (physics), physical movements of atoms and molecules. The atoms and molecules are allowed to interact for a fixed period of time, giving a view of the dynamics ( ...

trajectories of flexible molecules. In 2002 a generalization for the application to probability distributions (continuous or not) was also proposed.

Generalizations

The algorithm was described for points in a three-dimensional space. The generalization to dimensions is immediate.

External links

This SVD algorithm is described in more detail at https://web.archive.org/web/20140225050055/http://cnx.org/content/m11608/latest/ A

Matlab MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...

function is available at http://www.mathworks.com/matlabcentral/fileexchange/25746-kabsch-algorithm
C++ implementation
(and unit test) using

Eigen Eigen may refer to: People with the given name *, Japanese sport shooter *, Japanese professional wrestler * Frauke Eigen (born 1969) German photographer, photojournalist and artist * Manfred Eigen (1927–2019), German biophysicist * Michael Ei ...

A Python script is available at https://github.com/charnley/rmsd. Another implementation can be found i
SciPy
A free PyMol plugin easily implementing Kabsch i

(This previously linked to CEalig

but this uses the Combinatorial Extension (CE) algorithm.) Visual Molecular Dynamics, VMD uses the Kabsch algorithm for its alignment. The

FoldX FoldX is a protein design algorithm that uses an empirical force field. It can determine the energetic effect of point mutations as well as the interaction energy of protein complexes (including Protein-DNA). FoldX can mutate protein and DNA ...

modeling toolsuite incorporates the Kabsch algorithm to measure RMSD between Wild Type and Mutated protein structures.

References

* ** With a correction in * * {{cite journal, last=Umeyama, first=Shinji, date=1991, title=Least-Squares Estimation of Transformation Parameters Between Two Point Patterns, journal=IEEE Trans. Pattern Anal. Mach. Intell., volume=13, issue=4, pages=376–380, doi=10.1109/34.88573 Bioinformatics algorithms