bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...

, the template modeling score or TM-score is a measure of similarity between two

protein structure Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers specifically polypeptides formed from sequences of amino acids, which are the monomers of the polymer. A single amino acid ...

s. The TM-score is intended as a more accurate measure of the global similarity of full-length protein structures than the often used RMSD measure. The TM-score indicates the similarity between two structures by a score between

(0,1]

, where 1 indicates a perfect match between two structures (thus the higher the better). Generally scores below 0.20 corresponds to randomly chosen unrelated proteins whereas structures with a score higher than 0.5 assume roughly the same fold. A quantitative study shows that proteins of TM-score = 0.5 have a

posterior probability The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posteri ...

of 37% in the same CATH topology family and of 13% in the same

SCOP A ( or ) was a poet as represented in Old English poetry. The scop is the Old English counterpart of the Old Norse ', with the important difference that "skald" was applied to historical persons, and scop is used, for the most part, to designat ...

fold family. The probabilities increase rapidly when TM-score > 0.5. The TM-score is designed to be independent of protein lengths.

The TM-score equation

TM-score between two protein structures (e.g., a template structure and a target structure) is defined by :

\text=\max\left \frac\sum_i^\frac \right /math>

where L_\text is the length of the amino acid sequence of the target protein,
and L_\text is the number of residues that appear in both the template and target structures. d_i is the distance between the i th pair of residues in the template and target structures, and d_0(L_\text)=1.24\sqrt 1.8 is a distance scale that normalizes distances. The maximum is taken over all possible structure superpositions of the model and template (or some sample thereof).

When comparing two protein structures that have the same residue order, L_\text reads from the C-alpha order number of the structure files (i.e., Column 23-26 in

Protein Data Bank (file format) The Protein Data Bank (PDB) file format is a textual file format describing the three-dimensional structures of molecules held in the Protein Data Bank, now succeeded by the Macromolecular Crystallographic Information File, mmCIF format. The PDB f ...

). When comparing two protein structures that have different sequences and/or different residue orders, a structural alignment is usually performed first, and TM-score is then calculated on the commonly aligned residues from the structural alignment.

Other measures

An often used structural similarity measure is

root-mean-square deviation The root mean square deviation (RMSD) or root mean square error (RMSE) is either one of two closely related and frequently used measures of the differences between true or predicted values on the one hand and observed values or an estimator on th ...

(RMSD). Because RMSD

=\sqrt

is calculated as an average of distance error (

d_i

) with equal weight over all residue pairs, a large local error on a few residue pairs can result in a quite large RMSD. On the other hand, by putting

d_i

in the denominator, TM-score naturally weights smaller distance errors more strongly than larger distance errors. Therefore, TM-score value is more sensitive to the global structural similarity rather than to the local structural errors, compared to RMSD. Another advantage of TM-score is the introduction of the scale

d_0(L_\text)=1.24\sqrt 1.8

which makes the magnitude of TM-score length-independent for random structure pairs, while RMSD and most other measures are length-dependent metrics. The Global Distance Test (GDT) algorithm, and its GDT TS score to represent "total score", is another measure of similarity between two

s with known amino acid correspondences (e.g. identical

amino acid sequence Protein primary structure is the linear sequence of amino acids in a peptide or protein. By convention, the primary structure of a protein is reported starting from the amino-terminal (N) end to the carboxyl-terminal (C) end. Protein biosynthe ...

s) but different

tertiary structure Protein tertiary structure is the three-dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains and the ...

s. GDT score has the same length-dependence issue as RMSD, because the average GDT score for random structure pairs has a power-law dependence on the protein size.

References

{{reflist

External links

TM-score webserver
— by the Yang Zhang research group. Calculates TM-score and supplies source code.

services and documentation on structure comparison and similarity measures. Bioinformatics Computational chemistry

The TM-score equation

Other measures

See also

References

External links