Protein secondary structure is the local spatial conformation of the polypeptide backbone excluding the side chains. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary structure elements typically spontaneously form as an intermediate before the protein folds into its three dimensional tertiary structure. Secondary structure is formally defined by the pattern of

hydrogen bond In chemistry, a hydrogen bond (H-bond) is a specific type of molecular interaction that exhibits partial covalent character and cannot be described as a purely electrostatic force. It occurs when a hydrogen (H) atom, Covalent bond, covalently b ...

s between the amino hydrogen and carboxyl oxygen atoms in the peptide backbone. Secondary structure may alternatively be defined based on the regular pattern of backbone dihedral angles in a particular region of the Ramachandran plot regardless of whether it has the correct hydrogen bonds. The concept of secondary structure was first introduced by Kaj Ulrik Linderstrøm-Lang at Stanford in 1952. Other types of

biopolymer Biopolymers are natural polymers produced by the cells of living organisms. Like other polymers, biopolymers consist of monomeric units that are covalently bonded in chains to form larger molecules. There are three main classes of biopolymers, ...

s such as

nucleic acid Nucleic acids are large biomolecules that are crucial in all cells and viruses. They are composed of nucleotides, which are the monomer components: a pentose, 5-carbon sugar, a phosphate group and a nitrogenous base. The two main classes of nuclei ...

s also possess characteristic secondary structures.

Types

The most common secondary structures are alpha helices and beta sheets. Other helices, such as the 3₁₀ helix and π helix, are calculated to have energetically favorable hydrogen-bonding patterns but are rarely observed in natural proteins except at the ends of α helices due to unfavorable backbone packing in the center of the helix. Other extended structures such as the polyproline helix and alpha sheet are rare in native state proteins but are often hypothesized as important

protein folding Protein folding is the physical process by which a protein, after Protein biosynthesis, synthesis by a ribosome as a linear chain of Amino acid, amino acids, changes from an unstable random coil into a more ordered protein tertiary structure, t ...

intermediates. Tight turns and loose, flexible loops link the more "regular" secondary structure elements. The random coil is not a true secondary structure, but is the class of conformations that indicate an absence of regular secondary structure.

Amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although over 500 amino acids exist in nature, by far the most important are the 22 α-amino acids incorporated into proteins. Only these 22 a ...

s vary in their ability to form the various secondary structure elements. Proline and

glycine Glycine (symbol Gly or G; ) is an amino acid that has a single hydrogen atom as its side chain. It is the simplest stable amino acid. Glycine is one of the proteinogenic amino acids. It is encoded by all the codons starting with GG (G ...

are sometimes known as "helix breakers" because they disrupt the regularity of the α helical backbone conformation; however, both have unusual conformational abilities and are commonly found in turns. Amino acids that prefer to adopt helical conformations in proteins include methionine, alanine, leucine,

glutamate Glutamic acid (symbol Glu or E; known as glutamate in its anionic form) is an α-amino acid that is used by almost all living beings in the biosynthesis of proteins. It is a Essential amino acid, non-essential nutrient for humans, meaning that ...

and lysine ("MALEK" in amino-acid 1-letter codes); by contrast, the large aromatic residues ( tryptophan, tyrosine and

phenylalanine Phenylalanine (symbol Phe or F) is an essential α-amino acid with the chemical formula, formula . It can be viewed as a benzyl group substituent, substituted for the methyl group of alanine, or a phenyl group in place of a terminal hydrogen of ...

) and C^β-branched amino acids ( isoleucine, valine, and threonine) prefer to adopt β-strand conformations. However, these preferences are not strong enough to produce a reliable method of predicting secondary structure from sequence alone. Low frequency collective vibrations are thought to be sensitive to local rigidity within proteins, revealing beta structures to be generically more rigid than alpha or disordered proteins. Neutron scattering measurements have directly connected the spectral feature at ~1 THz to collective motions of the secondary structure of beta-barrel protein GFP. Hydrogen bonding patterns in secondary structures may be significantly distorted, which makes automatic determination of secondary structure difficult. There are several methods for formally defining protein secondary structure (e.g., DSSP, DEFINE, STRIDE, ScrewFit
SST
ref name=":0">).

DSSP classification

The Dictionary of Protein Secondary Structure, in short DSSP, is commonly used to describe the protein secondary structure with single letter codes. The secondary structure is assigned based on hydrogen bonding patterns as those initially proposed by Pauling et al. in 1951 (before any protein structure had ever been experimentally determined). There are eight types of secondary structure that DSSP defines: * G = 3-turn helix ( 3₁₀ helix). Min length 3 residues. * H = 4-turn helix ( α helix). Minimum length 4 residues. * I = 5-turn helix ( π helix). Minimum length 5 residues. * T = hydrogen bonded turn (3, 4 or 5 turn) * E = extended strand in parallel and/or anti-parallel

β-sheet The beta sheet (β-sheet, also β-pleated sheet) is a common structural motif, motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone chain, backbon ...

conformation. Min length 2 residues. * B = residue in isolated β-bridge (single pair β-sheet hydrogen bond formation) * S = bend (the only non-hydrogen-bond based assignment). * C = coil (residues which are not in any of the above conformations). 'Coil' is often codified as ' ' (space), C (coil) or '–' (dash). The helices (G, H and I) and sheet conformations are all required to have a reasonable length. This means that 2 adjacent residues in the primary structure must form the same hydrogen bonding pattern. If the helix or sheet hydrogen bonding pattern is too short they are designated as T or B, respectively. Other protein secondary structure assignment categories exist (sharp turns, Omega loops, etc.), but they are less frequently used. Secondary structure is defined by

ing, so the exact definition of a hydrogen bond is critical. The standard hydrogen-bond definition for secondary structure is that of DSSP, which is a purely electrostatic model. It assigns charges of ±''q''₁ ≈ 0.42 ''e'' to the carbonyl carbon and oxygen, respectively, and charges of ±''q''₂ ≈ 0.20''e'' to the amide hydrogen and nitrogen, respectively. The electrostatic energy is :

E = q_ q_ 
\left( \frac + \frac - \frac - \frac \right) \cdot 332 \text.

According to DSSP, a hydrogen-bond exists if and only if ''E'' is less than . Although the DSSP formula is a relatively crude approximation of the ''physical'' hydrogen-bond energy, it is generally accepted as a tool for defining secondary structure.

SST classification

SST is a Bayesian method to assign secondary structure to protein coordinate data using the Shannon information criterion of Minimum Message Length ( MML) inference.
SST
treats any assignment of secondary structure as a potential hypothesis that attempts to explain ( compress) given protein coordinate data. The core idea is that the ''best'' secondary structural assignment is the one that can explain ( compress) the coordinates of a given protein coordinates in the most economical way, thus linking the inference of secondary structure to

lossless data compression Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits Redundanc ...

. SST accurately delineates any protein chain into regions associated with the following assignment types: * E = (Extended) strand of a β-pleated sheet * G = Right-handed 3₁₀ helix * H = Right-handed α-helix * I = Right-handed π-helix * g = Left-handed 3₁₀ helix * h = Left-handed α-helix * i = Left-handed π-helix * 3 = 3₁₀-like Turn * 4 = α-like Turn * 5 = π-like Turn * T = Unspecified Turn * C = Coil * - = Unassigned residue SST detects π and 3₁₀ helical caps to standard α-helices, and automatically assembles the various extended strands into consistent β-pleated sheets. It provides a readable output of dissected secondary structural elements, and a corresponding PyMol-loadable script to visualize the assigned secondary structural elements individually.

Experimental determination

The rough secondary-structure content of a biopolymer (e.g., "this protein is 40% α-helix and 20%

.") can be estimated spectroscopically. For proteins, a common method is far-ultraviolet (far-UV, 170–250 nm)

circular dichroism Circular dichroism (CD) is dichroism involving circular polarization, circularly polarized light, i.e., the differential Absorption (electromagnetic radiation), absorption of left- and right-handed light. Left-hand circular (LHC) and right-hand ci ...

. A pronounced double minimum at 208 and 222 nm indicate α-helical structure, whereas a single minimum at 204 nm or 217 nm reflects random-coil or β-sheet structure, respectively. A less common method is infrared spectroscopy, which detects differences in the bond oscillations of amide groups due to hydrogen-bonding. Finally, secondary-structure contents may be estimated accurately using the chemical shifts of an initially unassigned NMR spectrum.

Prediction

Predicting protein tertiary structure from only its amino sequence is a very challenging problem (see

protein structure prediction Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its Protein secondary structure, secondary and Protein tertiary structure, tertiary structure ...

), but using the simpler secondary structure definitions is more tractable. Early methods of secondary-structure prediction were restricted to predicting the three predominate states: helix, sheet, or random coil. These methods were based on the helix- or sheet-forming propensities of individual amino acids, sometimes coupled with rules for estimating the free energy of forming secondary structure elements. The first widely used techniques to predict protein secondary structure from the amino acid sequence were the Chou–Fasman method and the GOR method. Although such methods claimed to achieve ~60% accurate in predicting which of the three states (helix/sheet/coil) a residue adopts, blind computing assessments later showed that the actual accuracy was much lower. A significant increase in accuracy (to nearly ~80%) was made by exploiting multiple sequence alignment; knowing the full distribution of amino acids that occur at a position (and in its vicinity, typically ~7 residues on either side) throughout

evolution Evolution is the change in the heritable Phenotypic trait, characteristics of biological populations over successive generations. It occurs when evolutionary processes such as natural selection and genetic drift act on genetic variation, re ...

provides a much better picture of the structural tendencies near that position. For illustration, a given protein might have a

at a given position, which by itself might suggest a random coil there. However, multiple sequence alignment might reveal that helix-favoring amino acids occur at that position (and nearby positions) in 95% of homologous proteins spanning nearly a billion years of evolution. Moreover, by examining the average

hydrophobicity In chemistry, hydrophobicity is the chemical property of a molecule (called a hydrophobe) that is seemingly intermolecular force, repelled from a mass of water. In contrast, hydrophiles are attracted to water. Hydrophobic molecules tend to b ...

at that and nearby positions, the same alignment might also suggest a pattern of residue solvent accessibility consistent with an α-helix. Taken together, these factors would suggest that the glycine of the original protein adopts α-helical structure, rather than random coil. Several types of methods are used to combine all the available data to form a 3-state prediction, including

neural networks A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either Cell (biology), biological cells or signal pathways. While individual neurons are simple, many of them together in a netwo ...

, hidden Markov models and support vector machines. Modern prediction methods also provide a confidence score for their predictions at every position. Secondary-structure prediction methods were evaluated by th
Critical Assessment of protein Structure Prediction (CASP) experiments
and continuously benchmarked, e.g. by EVA (benchmark). Based on these tests, the most accurate methods were Psipred, SAM, PORTER, PROF, and SABLE. The chief area for improvement appears to be the prediction of β-strands; residues confidently predicted as β-strand are likely to be so, but the methods are apt to overlook some β-strand segments (false negatives). There is likely an upper limit of ~90% prediction accuracy overall, due to the idiosyncrasies of the standard method ( DSSP) for assigning secondary-structure classes (helix/strand/coil) to PDB structures, against which the predictions are benchmarked. Accurate secondary-structure prediction is a key element in the prediction of tertiary structure, in all but the simplest ( homology modeling) cases. For example, a confidently predicted pattern of six secondary structure elements βαββαβ is the signature of a

ferredoxin Ferredoxins (from Latin ''ferrum'': iron + redox, often abbreviated "fd") are iron–sulfur proteins that mediate electron transfer in a range of metabolic reactions. The term "ferredoxin" was coined by D.C. Wharton of the DuPont Co. and applied t ...

fold.

Applications

Both protein and nucleic acid secondary structures can be used to aid in multiple sequence alignment. These alignments can be made more accurate by the inclusion of secondary structure information in addition to simple sequence information. This is sometimes less useful in RNA because base pairing is much more highly conserved than sequence. Distant relationships between proteins whose primary structures are unalignable can sometimes be found by secondary structure. It has been shown that α-helices are more stable, robust to mutations, and designable than β-strands in natural proteins, thus designing functional all-α proteins is likely to be easier that designing proteins with both helices and strands; this has been recently confirmed experimentally.

References

External links

NetSurfP – Secondary Structure and Surface Accessibility predictorPROFScrewFitPSSpred
A multiple neural network training program for protein secondary structure prediction
Genesilico metaserver
Metaserver which allows to run over 20 different secondary structure predictors by one click
SST
webserver: An information-theoretic (compression-based) secondary structural assignment. {{Biomolecular structure Protein structure 2 Stereochemistry