Self-consistent mean field (biology)
   HOME

TheInfoList



OR:

The self-consistent mean field (SCMF) method is an adaptation of
mean field theory In physics and probability theory, Mean-field theory (MFT) or Self-consistent field theory studies the behavior of high-dimensional random (stochastic) models by studying a simpler model that approximates the original by averaging over degrees of ...
used in
protein structure prediction Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different ...
to determine the optimal
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha a ...
side chain In organic chemistry and biochemistry, a side chain is a chemical group that is attached to a core part of the molecule called the "main chain" or backbone. The side chain is a hydrocarbon branching element of a molecule that is attached to a ...
packing given a fixed protein backbone. It is faster but less accurate than
dead-end elimination The dead-end elimination algorithm (DEE) is a method for minimizing a function over a discrete set of independent variables. The basic idea is to identify "dead ends", i.e., combinations of variables that are not necessary to define a global mi ...
and is generally used in situations where the protein of interest is too large for the problem to be tractable by DEE.


General principles

Like dead-end elimination, the SCMF method explores conformational space by discretizing the
dihedral angle A dihedral angle is the angle between two intersecting planes or half-planes. In chemistry, it is the clockwise angle between half-planes through two sets of three atoms, having two atoms in common. In solid geometry, it is defined as the un ...
s of each side chain into a set of
rotamer In chemistry, conformational isomerism is a form of stereoisomerism in which the isomers can be interconverted just by rotations about formally single bonds (refer to figure on single bond rotation). While any two arrangements of atoms in a mole ...
s for each position in the protein sequence. The method iteratively develops a probabilistic description of the relative population of each possible rotamer at each position, and the probability of a given structure is defined as a function of the probabilities of its individual rotamer components. The basic requirements for an effective SCMF implementation are: # A well-defined finite set of discrete independent variables # A precomputed numerical value (considered the "energy") associated with each element in the set of variables, and associated with each binary element pair # An initial probability distribution describing the starting population of each individual rotamer # A way of updating rotamer energies and probabilities as a function of the mean-field energy The process is generally initialized with a uniform probability distribution over the rotamers—that is, if there are p rotamers at the kth position in the protein, then the probability of any individual rotamer r_^ is 1/p. The conversion between energies and probabilities is generally accomplished via the
Boltzmann distribution In statistical mechanics and mathematics, a Boltzmann distribution (also called Gibbs distribution Translated by J.B. Sykes and M.J. Kearsley. See section 28) is a probability distribution or probability measure that gives the probability th ...
, which introduces a temperature factor (thus making the method amenable to
simulated annealing Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space for an optimization problem. ...
). Lower temperatures increase the likelihood of converging to a single solution, rather than to a small subpopulation of solutions.


Mean-field energies

The energy of an individual rotamer r_ is dependent on the "mean-field" energy of the other positions—that is, at every other position, each rotamer's energy contribution is proportional to its probability. For a protein of length N with p rotamers per residue, the energy at the current iteration is described by the following expression. Note that for clarity, the mean-field energy at iteration i is denoted by M_, whereas the precomputed energies are denoted by E, and the probability of a given rotamer is denoted by P_(r_^). : M_(r_^) = E_(r_^) + \sum_^ \sum_^ P_(r_^) E_(r_^, r_^) These mean-field energies are used to update the probabilities through the Boltzmann law: : P_(r_^) = \left(\exp\left(-\frac\right)\right)\left(\sum_^\exp\left(-\frac\right)\right)^ where k is the
Boltzmann constant The Boltzmann constant ( or ) is the proportionality factor that relates the average relative kinetic energy of particles in a gas with the thermodynamic temperature of the gas. It occurs in the definitions of the kelvin and the gas constant, ...
and T is the temperature factor.


Energy of the system

Although computing the system energy is not required in carrying out the SCMF method, it is useful to know the overall energies of the converged results. The system energy M_ consists of two sums: : M_ = M_ + M_ where the addends are defined as: : M_ = \sum_^ \sum_^ P(r_^)E_(r_^) : M_ = \sum_^ \sum_^ \sum_^ \sum_^ \left(P(r_^)P(r_^)E_(r_^, r_^)\right)


Convergence

Perfect convergence for the SCMF method would result in a probability of 1 for exactly one rotamer at each position k in the protein, and a probability of zero for all other rotamers at each position. Convergence to a unique solution requires probabilities close to 1 for exactly one rotamer at each position. In practice, especially when higher temperatures are used, the algorithm instead identifies a small number of high-probability rotamers at each position, allowing the resulting conformations' relative energies to then be enumerated (based on the precomputed energies, not on those derived from the mean-field approximation). One way to improve convergence is to run again at a lower temperature using the probabilities calculated from a previous higher-temperature run.


Accuracy

Unlike dead-end elimination, SCMF is not guaranteed to converge on the optimal solution. However, it is deterministic (as in, it will converge to the same solution every time given the same initial conditions), unlike alternatives that rely on Monte Carlo analysis. By comparison to DEE, which is guaranteed to find the optimal solution, SCMF is faster but less accurate overall; it is significantly better at identifying correct side chain conformations in the protein's core than it is on identifying correct surface conformations. Geometric packing constraints are less restrictive on the surface and thus provide fewer boundaries to the conformational search.


References

# Koehl P, Delarue M. (1994). Application of a self-consistent mean field theory to predict protein side-chains conformation and estimate their conformational entropy. ''J Mol Biol'' 239(2):249-75. # {{note, Voigt Voigt CA, Gordon DB, Mayo SL. (2000). Trading accuracy for speed: A quantitative comparison of search algorithms in protein sequence design. ''J Mol Biol'' 299(3):789-803. Applied mathematics Protein methods