Structure Validation
   HOME

TheInfoList



OR:

Macromolecular structure validation is the process of evaluating reliability for 3-dimensional atomic models of large biological molecules such as
proteins Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, re ...
and
nucleic acids Nucleic acids are large biomolecules that are crucial in all cells and viruses. They are composed of nucleotides, which are the monomer components: a 5-carbon sugar, a phosphate group and a nitrogenous base. The two main classes of nucleic a ...
. These models, which provide 3D coordinates for each atom in the molecule (see example in the image), come from
structural biology Structural biology deals with structural analysis of living material (formed, composed of, and/or maintained and refined by living cells) at every level of organization. Early structural biologists throughout the 19th and early 20th centuries we ...
experiments such as
x-ray crystallography X-ray crystallography is the experimental science of determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to Diffraction, diffract in specific directions. By measuring th ...
or
nuclear magnetic resonance Nuclear magnetic resonance (NMR) is a physical phenomenon in which nuclei in a strong constant magnetic field are disturbed by a weak oscillating magnetic field (in the near field) and respond by producing an electromagnetic signal with a ...
(NMR). The validation has three aspects: 1) checking on the validity of the thousands to millions of measurements in the experiment; 2) checking how consistent the atomic model is with those experimental data; and 3) checking consistency of the model with known physical and chemical properties. Proteins and nucleic acids are the workhorses of biology, providing the necessary chemical reactions, structural organization, growth, mobility, reproduction, and environmental sensitivity. Essential to their biological functions are the detailed 3D structures of the molecules and the changes in those structures. To understand and control those functions, we need accurate knowledge about the models that represent those structures, including their many strong points and their occasional weaknesses. End-users of macromolecular models include clinicians, teachers and students, as well as the structural biologists themselves, journal editors and
referee A referee is an official, in a variety of sports and competition, responsible for enforcing the rules of the sport, including sportsmanship decisions such as ejection. The official tasked with this job may be known by a variety of other title ...
s, experimentalists studying the macromolecules by other techniques, and theoreticians and
bioinformaticians Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
studying more general properties of biological molecules. Their interests and requirements vary, but all benefit greatly from a global and local understanding of the reliability of the models.


Historical summary

Macromolecular crystallography was preceded by the older field of small-molecule
x-ray crystallography X-ray crystallography is the experimental science of determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to Diffraction, diffract in specific directions. By measuring th ...
(for structures with less than a few hundred atoms). Small-molecule
diffraction Diffraction is the deviation of waves from straight-line propagation without any change in their energy due to an obstacle or through an aperture. The diffracting object or aperture effectively becomes a secondary source of the Wave propagation ...
data extends to much higher resolution than feasible for macromolecules, and has a very clean mathematical relationship between the data and the atomic model. The residual, or R-factor, measures the agreement between the experimental data and the values back-calculated from the atomic model. For a well-determined small-molecule structure the R-factor is nearly as small as the uncertainty in the experimental data (well under 5%). Therefore, that one test by itself provides most of the validation needed, but a number of additional consistency and methodology checks are done by automated software as a requirement for small-molecule crystal structure papers submitted to the
International Union of Crystallography The International Union of Crystallography (IUCr) is an organisation devoted to the international promotion and coordination of the science of crystallography. The IUCr is a member of the International Council for Science (ICSU). Objectives T ...
(IUCr) journals such as
Acta Crystallographica ''Acta Crystallographica'' is a series of peer-reviewed scientific journals, with articles centred on crystallography, published by the International Union of Crystallography (IUCr). Originally established in 1948 as a single journal called ''A ...
section B or C. Atomic coordinates of these small-molecule structures are archived and accessed through the
Cambridge Structural Database The Cambridge Structural Database (CSD) is both a repository and a validated and curated resource for the three-dimensional structural data of molecules generally containing at least carbon and hydrogen, comprising a wide range of organic, metal ...
(CSD) or the
Crystallography Open Database The Crystallography Open Database (COD) is a database of crystal structures. Unlike similar crystallographic databases, the database is entirely open-access, with registered users able to contribute published and unpublished structures of small m ...
(COD). The first macromolecular validation software was developed around 1990, for proteins. It included Rfree cross-validation for model-to-data match, bond length and angle parameters for covalent geometry, and sidechain and backbone conformational criteria. For macromolecular structures, the atomic models are deposited in the
Protein Data Bank The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules such as proteins and nucleic acids, which is overseen by the Worldwide Protein Data Bank (wwPDB). This structural data is obtained a ...
(PDB), still the single archive of this data. The PDB was established in the 1970s at
Brookhaven National Laboratory Brookhaven National Laboratory (BNL) is a United States Department of Energy national laboratories, United States Department of Energy national laboratory located in Upton, New York, a hamlet of the Brookhaven, New York, Town of Brookhaven. It w ...
, moved in 2000 to th
RCSB
(Research Collaboration for Structural Biology) centered at
Rutgers Rutgers University ( ), officially Rutgers, The State University of New Jersey, is a public land-grant research university consisting of three campuses in New Jersey. Chartered in 1766, Rutgers was originally called Queen's College and was aff ...
, and expanded in 2003 to become th
wwPDB
(worldwide Protein Data Bank), with access sites added in Europe

and Asia

, and with NMR data handled at th
BioMagResBank (BMRB)
in Wisconsin. Validation rapidly became standard in the field, with further developments described below. *Obviously needs expansion* A large boost was given to the applicability of comprehensive validation for both x-ray and NMR as of February 1, 2008, when the worldwide
Protein Data Bank The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules such as proteins and nucleic acids, which is overseen by the Worldwide Protein Data Bank (wwPDB). This structural data is obtained a ...
(wwPDB) made mandatory the deposition of experimental data along with atomic coordinates. Since 2012 strong forms of validation have been in the process of being adopted fo
wwPDB deposition
from recommendations of the wwPDB Validation Task Force committees for
x-ray crystallography X-ray crystallography is the experimental science of determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to Diffraction, diffract in specific directions. By measuring th ...
, for NMR, for SAXS ( SAXS, small-angle x-ray scattering), and for cryoEM (cryo-
Electron Microscopy An electron microscope is a microscope that uses a beam of electrons as a source of illumination. It uses electron optics that are analogous to the glass lenses of an optical light microscope to control the electron beam, for instance focusing i ...
).


Stages of validation

Validations can be broken into three stages: validating the raw data collected (data validation), the interpretation of the data into the atomic model (model-to-data validation), and finally validation on the model itself. While the first two steps are specific to the technique used, validating the arrangement of atoms in the final model is not.


Model validation


Geometry


Conformation (dihedrals): protein & RNA

The backbone and side-chain dihedral angles of protein and RNA have been shown to have specific combinations of angles which are allowed (or forbidden). For protein backbone dihedrals (φ, ψ), this has been addressed by the legendary
Ramachandran Plot In biochemistry, a Ramachandran plot (also known as a Rama plot, a Ramachandran diagram or a †,ψplot), originally developed in 1963 by G. N. Ramachandran, C. Ramakrishnan, and V. Sasisekharan, is a way to visualize energetically allowed regio ...
while for side-chain dihedrals (χ's), one should refer to the Dunbrack
Backbone-dependent rotamer library In biochemistry, a backbone-dependent rotamer library provides the frequencies, mean dihedral angles, and standard deviations of the discrete conformations (known as rotamers) of the amino acid Side_chain#Biochemistry, side chains in proteins as ...
. Though, mRNA structures are generally short-lived and single-stranded, there are an abundance of non-coding RNAs with different secondary and tertiary folding (tRNA, rRNA etc.) which contain a preponderance of the canonical Watson-Crick (WC) base-pairs, together with significant number of non-Watson Crick (NWC) base-pairs - for which such RNA also qualify for regular structural validation that apply for nucleic acid helices. The standard practice is to analyse the intra- (Transnational: Shift, Slide, Rise; Rotational: Tilt, Roll, Twist) and inter-base-pair geometrical parameters (Transnational: Shear, Stagger, Stretch, Rotational: Buckle, Propeller, Opening) - whether in-range or out-of-range with respect to their suggested values. These parameters describe the relative orientations of the two paired bases with respect to each other in two strands (intra) along with those of the two stacked base pairs (inter) with respect to each other, and, hence, together, they serve to validate nucleic acid structures in general. Since, RNA-helices are small in length (average: 10-20 bps), the use of electrostatic surface potential as a validation parameter has been found to be beneficial, particularly for modelling purposes.


Packing and Electrostatics: globular proteins

For globular proteins, interior atomic packing (arising from short-range, local interactions) of side-chains has been shown to be pivotal in the structural stabilization of the protein-fold. On the other hand, the electrostatic harmony (non-local, long-range) of the overall fold has also been shown to be essential for its stabilization. Packing anomalies include steric clashes, short contacts, holes and cavities while electrostatic disharmony refer to unbalanced partial charges in the protein core (particularly relevant for designed protein interiors). While the clash-score o
Molprobity
identifies steric clashes at a very high resolution, the
Complementarity Plot The complementarity plot (CP) is a graphical tool for structural validation of atomic models for both folded Globular protein, globular proteins and Protein-protein interface, protein-protein interfaces.Basu S, Bhattacharyya D, Banerjee R (2012) ...
combines packing anomalies with electrostatic imbalance of side-chains and signals for either or both.


Carbohydrates

The branched and cyclic nature of carbohydrates poses particular problems to structure validation tools. At higher resolutions, it is possible to determine the sequence/structure of oligo- and poly-saccharides, both as covalent modifications and as ligands. However, at lower resolutions (typically lower than 2.0Ã…), sequences/structures should either match known structures, or be supported by complementary techniques such as Mass Spectrometry. Also, monosaccharides have clear conformational preferences (saturated rings are typically found in chair conformations), but errors introduced during model building and/or refinement (wrong linkage chirality or distance, or wrong choice of model - see for recommendations on carbohydrate model building and refinement and for reviews on general errors in carbohydrate structures) can bring their atomic models out of the more likely low-energy state. Around 20% of the deposited carbohydrate structures are in a higher-energy conformation not justified by the structural data (measured using real-space correlation coefficient). A number of carbohydrate validation web services are available a
glycosciences.de
(including nomenclature checks and linkage checks b
pdb-care
and cross-validation with Mass Spectrometry data through the use of GlycanBuilder), whereas the CCP4 suite currently distribute
Privateer
which is a tool that is integrated into the model building and refinement process itself. Privateer is able to check stereo- and regio-chemistry, ring conformation and puckering, linkage torsions, and real-space correlation against positive omit density, generating aperiodic torsion restraints on ring bonds, which can be used by any refinement software in order to maintain the monosaccharide's minimal energy conformation. Privateer also generates scalable two-dimensional SVG diagrams according to the Essentials of Glycobiology standard symbol nomenclature containing all the validation information as tooltip annotations (see figure). This functionality is currently integrated into other CCP4 programs, such as the molecular graphics program CCP4mg (through the ''Glycoblocks'' 3D representation, which conforms to the standard symbol nomenclature) and the suite's graphical interface, CCP4i2.


Validation for crystallography


Overall considerations


Global vs local criteria

Many evaluation criteria apply globally to an entire experimental structure, most notably the resolution, the
anisotropy Anisotropy () is the structural property of non-uniformity in different directions, as opposed to isotropy. An anisotropic object or pattern has properties that differ according to direction of measurement. For example, many materials exhibit ve ...
or incompleteness of the data, and the residual or R-factor that measures overall model-to-data match (see below). Those help a user choose the most accurate among related
Protein Data Bank The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules such as proteins and nucleic acids, which is overseen by the Worldwide Protein Data Bank (wwPDB). This structural data is obtained a ...
entries to answer their questions. Other criteria apply to individual residues or local regions in the 3D structure, such as fit to the local
electron density Electron density or electronic density is the measure of the probability of an electron being present at an infinitesimal element of space surrounding any given point. It is a scalar quantity depending upon three spatial variables and is typical ...
map or steric clashes between atoms. Those are especially valuable to the structural biologist for making improvements to the model, and to the user for evaluating the reliability of that model right around the place they care about - such as a site of enzyme activity or drug binding. Both types of measures are very useful, but although global criteria are easier to state or publish, local criteria make the greatest contribution to scientific accuracy and biological relevance. As expressed in the Rupp textbook, "Only local validation, including assessment of both geometry and electron density, can give an accurate picture of the reliability of the structure model or any hypothesis based on local features of the model."


Relationship to resolution and B-factor


Data validation


Structure factors


Twinning


Model-to-data validation


Residuals and Rfree


Real-space correlation


Improvement by correcting diagnosed problems


In nuclear magnetic resonance


Data Validation: Chemical Shifts, NOEs, RDCs

;AVS: Assignment validation suite
AVS
checks the chemical shifts list in BioMagResBank (BMRB) format for problems. ;PSVS: Protein Structure Validation Server at the NESG based on information retrieval statistics ;
PROSESS Protein Structure Evaluation Suite & Server (PROSESS) is a freely available web server for protein structure validation. It has been designed at the University of Alberta to assist with the process of evaluating and validating protein stru ...
: PROSESS (Protein Structure Evaluation Suite & Server) is a new web server that offers an assessment of protein structural models by NMR chemical shifts as well as NOEs, geometrical, and knowledge-based parameters. ;LACS:Linear analysis of chemical shifts is used for absolute referencing of chemical shift data.


Model-to-data validation

TALOS+. Predicts protein backbone torsion angles from chemical shift data. Frequently used to generate further restraints applied to a structure model during refinement.


Model validation: as above


Dynamics: core vs loops, tails, and mobile domains

One of the critical needs for NMR structural ensemble validation is to distinguish well-determined regions (those that have experimental data) from regions that are highly mobile and/or have no observed data. There are several current or proposed methods for making this distinction such as
Random Coil Index Random coil index (RCI) predicts protein flexibility by calculating an inverse weighted average of backbone secondary chemical shifts and predicting values of model-free order parameters as well as per-residue RMSD of NMR and molecular dynamics ...
, but so far the NMR community has not standardized on one.


Software and websites


In cryo-EM

Cyro-EM presents special challenges to model-builders as the observed electron density is frequently insufficient to resolve individual atoms, leading to a higher likelihood of errors. Geometry-based validation tools similar to those used in X-ray crystallography can be used to highlight implausible modeling choices and guide modeler toward more native-like structures. The CaBLAM method, which only uses Cα atoms, is suitable for low-resolution structures from cyro-EM. A way to compute the
difference density map Difference commonly refers to: * Difference (philosophy), the set of properties by which items are distinguished * Difference (mathematics), the result of a subtraction Difference, The Difference, Differences or Differently may also refer to: Mu ...
has been formulated for cyro-EM. Cross-validation using a "free" map, comparable to the use of a free R-factor, is also available. Other methods for checking model-map fit include correlation coefficients, model-map FSC, confidence maps, CryoEF (orientation bias check), and TEMPy SMOC.


In SAXS

SAXS (small-angle x-ray scattering) is a rapidly growing area of structure determination, both as a source of approximate 3D structure for initial or difficult cases and as a component of hybrid-method structure determination when combined with NMR, EM, crystallographic, cross-linking, or computational information. There is great interest in the development of reliable validation standards for SAXS data interpretation and for quality of the resulting models, but there are as yet no established methods in general use. Three recent steps in this direction are the creation of a Small-Angle Scattering Validation Task Force committee by the worldwide Protein DataBank and its initial report, a set of suggested standards for data inclusion in publications, and an initial proposal of statistically derived criteria for automated quality evaluation.


For computational biology

It is difficult to do meaningful validation of an individual, purely computational, macromolecular model in the absence of experimental data for that molecule, because the model with the best geometry and conformational score may not be the one closest to the right answer. Therefore, much of the emphasis in validation of computational modeling is in assessment of the methods. To avoid bias and wishful thinking, double-blind prediction competitions have been organized, the original example of which (held every 2 years since 1994) is
CASP Critical Assessment of Structure Prediction (CASP), sometimes called Critical Assessment of Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994. CASP pro ...
(Critical Assessment of Structure Prediction) to evaluate predictions of 3D protein structure for newly solved
crystallographic Crystallography is the branch of science devoted to the study of molecular and crystalline structure and properties. The word ''crystallography'' is derived from the Ancient Greek word (; "clear ice, rock-crystal"), and (; "to write"). In J ...
or
NMR Nuclear magnetic resonance (NMR) is a physical phenomenon in which atomic nucleus, nuclei in a strong constant magnetic field are disturbed by a weak oscillating magnetic field (in the near and far field, near field) and respond by producing ...
structures held in confidence until the end of the relevant competition. The major criterion for CASP evaluation is a weighted score called GDT-TS for the match of Calpha positions between the predicted and the experimental models.


See also

* List of biophysically important macromolecular crystal structures


References


External links

* Computational prediction *
CASP experiments home page
*

* General-purpose structure validation
validation/deposition sitewwPDB
version) *
MolProbity web service
(has NMR-specific features) ** PDBREPORT

Protein structure validation database *
What_Check software
*
ProCheck software


*
pdb-care (carbohydrate validation)
*

*
OOPS2, part of the Uppsala Software Factory
*
ProSA web service
*
Verify-3D profile analysis

NUPARM
(Nucleic Acid validation)

(RNA validation) * X-ray *
EDS (Electron Density Server)
** Coot (software), Coot - modeling software (built-in validation

*
PDB-REDO
- X-ray model optimization: rebuilding and refining all PDB models using up-to-date techniques **
PROSESS Protein Structure Evaluation Suite & Server (PROSESS) is a freely available web server for protein structure validation. It has been designed at the University of Alberta to assist with the process of evaluating and validating protein stru ...
- Protein Structure Evaluation Suite & Server ** Resolution by Proxy, ResProx - protein model resolution-by-proxy *
VADAR - Volume, Area, Dihedral Angle Reporter
* NMR *
PSVS (Protein Structure Validation Server at the NESG)
ref name="HuangPowers2005"/> *
CING (Common Interface for NMR structure Generation) software
*
ProCheck
- stereochemical quality check for X-ray and NMR *
TALOS+ Software & Server
(server for predicting protein backbone torsion angles from chemical shift) *
VADAR - Volume, Area, Dihedral Angle Reporter
**
PROSESS Protein Structure Evaluation Suite & Server (PROSESS) is a freely available web server for protein structure validation. It has been designed at the University of Alberta to assist with the process of evaluating and validating protein stru ...
- Protein Structure Evaluation Suite & Server *
ResProx - protein model resolution-by-proxy
* Cyro-EM *

*
EMDB at the PDB, info on ftp download of maps
*
CERES
rebuilds (and hopefully improves) Cyro-EM models using the latest version of PHENIX


Link references


Further reading

* * {{Refend Structural biology Protein methods Protein structure