The Chou–Fasman method is an empirical technique for the
prediction
A prediction (Latin ''præ-'', "before," and ''dicere'', "to say"), or forecast, is a statement about a future event or data. They are often, but not always, based upon experience or knowledge. There is no universal agreement about the exact ...
of
secondary structure
Protein secondary structure is the three dimensional conformational isomerism, form of ''local segments'' of proteins. The two most common Protein structure#Secondary structure, secondary structural elements are alpha helix, alpha helices and beta ...
s in
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
s, originally developed in the 1970s by Peter Y. Chou and Gerald D. Fasman.
The method is based on analyses of the relative frequencies of each
amino acid
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha am ...
in
alpha helices,
beta sheet
The beta sheet, (β-sheet) (also β-pleated sheet) is a common motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone hydrogen bonds, forming a g ...
s, and
turns based on known
protein structures solved with
X-ray crystallography
X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles ...
. From these frequencies a set of probability parameters were derived for the appearance of each amino acid in each secondary structure type, and these parameters are used to predict the
probability
Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
that a given sequence of amino acids would form a helix, a beta strand, or a turn in a protein. The method is at most about 50–60% accurate in identifying correct secondary structures,
which is significantly less accurate than the modern
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
–based techniques.
Amino acid propensities
The original Chou–Fasman parameters found some strong tendencies among individual amino acids to prefer one type of secondary structure over others.
Alanine,
glutamate
Glutamic acid (symbol Glu or E; the ionic form is known as glutamate) is an α-amino acid that is used by almost all living beings in the biosynthesis of proteins. It is a non-essential nutrient for humans, meaning that the human body can syn ...
,
leucine, and
methionine
Methionine (symbol Met or M) () is an essential amino acid in humans. As the precursor of other amino acids such as cysteine and taurine, versatile compounds such as SAM-e, and the important antioxidant glutathione, methionine plays a critical ro ...
were identified as helix formers, while
proline
Proline (symbol Pro or P) is an organic acid classed as a proteinogenic amino acid (used in the biosynthesis of proteins), although it does not contain the amino group but is rather a secondary amine. The secondary amine nitrogen is in the prot ...
and
glycine, due to the unique conformational properties of their
peptide bond
In organic chemistry, a peptide bond is an amide type of covalent chemical bond linking two consecutive alpha-amino acids from C1 (carbon number one) of one alpha-amino acid and N2 (nitrogen number two) of another, along a peptide or protein cha ...
s, commonly end a helix. The original Chou–Fasman parameters
were derived from a very small and non-representative sample of protein structures due to the small number of such structures that were known at the time of their original work. These original parameters have since been shown to be unreliable
and have been updated from a current dataset, along with modifications to the initial algorithm.
The Chou–Fasman method takes into account only the probability that each individual amino acid will appear in a helix, strand, or turn. Unlike the more complex
GOR method
The GOR method (short for Garnier–Osguthorpe–Robson) is an information theory-based method for the prediction of secondary structures in proteins. It was developed in the late 1970s shortly after the simpler Chou–Fasman method. Like Chou–F ...
, it does not reflect the conditional probabilities of an amino acid to form a particular secondary structure given that its neighbors already possess that structure. This lack of cooperativity increases its computational efficiency but decreases its accuracy, since the propensities of individual amino acids are often not strong enough to render a definitive prediction.
Algorithm
The Chou–Fasman method predicts helices and strands in a similar fashion, first searching linearly through the sequence for a "nucleation" region of high helix or strand probability and then extending the region until a subsequent four-residue window carries a probability of less than 1. As originally described, four out of any six contiguous amino acids were sufficient to nucleate helix, and three out of any contiguous five were sufficient for a sheet. The probability thresholds for helix and strand nucleations are constant but not necessarily equal; originally 1.03 was set as the helix cutoff and 1.00 for the strand cutoff.
Turns are also evaluated in four-residue windows, but are calculated using a multi-step procedure because many turn regions contain amino acids that could also appear in helix or sheet regions. Four-residue turns also have their own characteristic amino acids;
proline
Proline (symbol Pro or P) is an organic acid classed as a proteinogenic amino acid (used in the biosynthesis of proteins), although it does not contain the amino group but is rather a secondary amine. The secondary amine nitrogen is in the prot ...
and
glycine are both common in turns. A turn is predicted only if the turn probability is greater than the helix or sheet probabilities ''and'' a probability value based on the positions of particular amino acids in the turn exceeds a predetermined threshold. The turn probability p(t) is determined as:
:
where ''j'' is the position of the amino acid in the four-residue window. If p(t) exceeds an arbitrary cutoff value (originally 7.5e–3), the mean of the p(j)'s exceeds 1, and p(t) exceeds the alpha helix and beta sheet probabilities for that window, then a turn is predicted. If the first two conditions are met but the probability of a beta sheet p(b) exceeds p(t), then a sheet is predicted instead.
See also
*
List of protein structure prediction software
This list of protein structure prediction software summarizes notable used software tools in protein structure prediction, including homology modeling, protein threading, ''ab initio'' methods, secondary structure prediction, and transmembrane ...
References
External links
Gerald D. Fasman on the Internet
{{DEFAULTSORT:Chou-Fasman method
Bioinformatics
Protein methods