
In
protein structure prediction
Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is differen ...
, statistical potentials or knowledge-based potentials are
scoring functions derived from an analysis of known
protein structures in the
Protein Data Bank
The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, c ...
(PDB).
The original method to obtain such potentials is the ''quasi-chemical approximation'', due to Miyazawa and Jernigan. It was later followed by the ''potential of mean force'' (statistical PMF ), developed by Sippl.
Although the obtained scores are often considered as approximations of the
free energy—thus referred to as ''pseudo-energies''—this physical interpretation is incorrect.
Nonetheless, they are applied with success in many cases, because they frequently correlate with actual
Gibbs free energy
In thermodynamics, the Gibbs free energy (or Gibbs energy; symbol G) is a thermodynamic potential that can be used to calculate the maximum amount of work (physics), work that may be performed by a closed system, thermodynamically closed system a ...
differences.
Overview
Possible features to which a pseudo-energy can be assigned include:
*
interatomic distances,
*
torsion angles,
*
solvent exposure,
* or
hydrogen bond
In chemistry, a hydrogen bond (or H-bond) is a primarily electrostatic force of attraction between a hydrogen (H) atom which is covalently bound to a more electronegative "donor" atom or group (Dn), and another electronegative atom bearing ...
geometry.
The classic application is, however, based on pairwise
amino acid contacts or distances, thus producing statistical
interatomic potentials. For pairwise amino acid contacts, a statistical potential is formulated as an
interaction matrix that assigns a weight or
energy value to each possible pair of
standard amino acids. The energy of a particular structural model is then the combined energy of all pairwise contacts (defined as two amino acids within a certain distance of each other) in the structure. The energies are determined using statistics on amino acid contacts in a database of known protein structures (obtained from the
PDB).
History
Initial development
Many textbooks present the statistical PMFs as proposed by Sippl
as a simple consequence of the
Boltzmann distribution
In statistical mechanics and mathematics, a Boltzmann distribution (also called Gibbs distribution Translated by J.B. Sykes and M.J. Kearsley. See section 28) is a probability distribution or probability measure that gives the probability ...
, as applied to pairwise distances between amino acids. This is incorrect, but a useful start to introduce the construction of the potential in practice.
The Boltzmann distribution applied to a specific pair of amino acids,
is given by:
:
where
is the distance,
is the
Boltzmann constant
The Boltzmann constant ( or ) is the proportionality factor that relates the average relative kinetic energy of particles in a gas with the thermodynamic temperature of the gas. It occurs in the definitions of the kelvin and the gas consta ...
,
is
the temperature and
is the
partition function, with
:
The quantity
is the free energy assigned to the pairwise system.
Simple rearrangement results in the ''inverse Boltzmann formula'',
which expresses the free energy
as a function of
:
:
To construct a PMF, one then introduces a so-called ''reference
state'' with a corresponding distribution
and partition function
, and calculates the following free energy difference:
:
The reference state typically results from a hypothetical
system in which the specific interactions between the amino acids
are absent. The second term involving
and
can be ignored, as it is a constant.
In practice,
is estimated from the database of known protein
structures, while
typically results from calculations
or simulations. For example,
could be the conditional probability
of finding the
atoms of a valine and a serine at a given
distance
from each other, giving rise to the free energy difference
. The total free energy difference of a protein,
, is then claimed to be the sum
of all the pairwise free energies:
where the sum runs over all amino acid pairs
(with