HOME

TheInfoList



OR:

A protein superfamily is the largest grouping (
clade In biology, a clade (), also known as a Monophyly, monophyletic group or natural group, is a group of organisms that is composed of a common ancestor and all of its descendants. Clades are the fundamental unit of cladistics, a modern approach t ...
) of
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
s for which
common ancestry Common descent is a concept in evolutionary biology applicable when one species is the ancestor of two or more species later in time. According to modern evolutionary biology, all living beings could be descendants of a unique ancestor commonl ...
can be inferred (see homology). Usually this common ancestry is inferred from structural alignment and mechanistic similarity, even if no sequence similarity is evident.
Sequence homology Sequence homology is the homology (biology), biological homology between DNA sequence, DNA, RNA sequence, RNA, or Protein primary structure, protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments ...
can then be deduced even if not apparent (due to low sequence similarity). Superfamilies typically contain several
protein families A protein family is a group of evolutionarily related proteins. In many cases, a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term "protein family" should not be c ...
which show sequence similarity within each family. The term ''protein clan'' is commonly used for
protease A protease (also called a peptidase, proteinase, or proteolytic enzyme) is an enzyme that catalysis, catalyzes proteolysis, breaking down proteins into smaller polypeptides or single amino acids, and spurring the formation of new protein products ...
and glycosyl hydrolases superfamilies based on the MEROPS and CAZy classification systems.


Identification

Superfamilies of proteins are identified using a number of methods. Closely related members can be identified by different methods to those needed to group the most evolutionarily divergent members.


Sequence similarity

Historically, the similarity of different amino acid sequences has been the most common method of inferring homology. Sequence similarity is considered a good predictor of relatedness, since similar sequences are more likely the result of
gene duplication Gene duplication (or chromosomal duplication or gene amplification) is a major mechanism through which new genetic material is generated during molecular evolution. It can be defined as any duplication of a region of DNA that contains a gene ...
and
divergent evolution Divergent evolution or divergent selection is the accumulation of differences between closely related populations within a species, sometimes leading to speciation. Divergent evolution is typically exhibited when two populations become separate ...
, rather than the result of
convergent evolution Convergent evolution is the independent evolution of similar features in species of different periods or epochs in time. Convergent evolution creates analogous structures that have similar form or function but were not present in the last comm ...
. Amino acid sequence is typically more conserved than DNA sequence (due to the degenerate genetic code), so it is a more sensitive detection method. Since some of the amino acids have similar properties (e.g., charge, hydrophobicity, size), conservative mutations that interchange them are often neutral to function. The most conserved sequence regions of a protein often correspond to functionally important regions like catalytic sites and binding sites, since these regions are less tolerant to sequence changes. Using sequence similarity to infer homology has several limitations. There is no minimum level of sequence similarity guaranteed to produce identical structures. Over long periods of evolution, related proteins may show no detectable sequence similarity to one another. Sequences with many insertions and deletions can also sometimes be difficult to align and so identify the homologous sequence regions. In the PA clan of
protease A protease (also called a peptidase, proteinase, or proteolytic enzyme) is an enzyme that catalysis, catalyzes proteolysis, breaking down proteins into smaller polypeptides or single amino acids, and spurring the formation of new protein products ...
s, for example, not a single residue is conserved through the superfamily, not even those in the
catalytic triad A catalytic triad is a set of three coordinated amino acid residues that can be found in the active site of some enzymes. Catalytic triads are most commonly found in hydrolase and transferase enzymes (e.g. proteases, amidases, esterases, aminoac ...
. Conversely, the individual families that make up a superfamily are defined on the basis of their sequence alignment, for example the C04 protease family within the PA clan. Nevertheless, sequence similarity is the most commonly used form of evidence to infer relatedness, since the number of known sequences vastly outnumbers the number of known tertiary structures. In the absence of structural information, sequence similarity constrains the limits of which proteins can be assigned to a superfamily.


Structural similarity

Structure A structure is an arrangement and organization of interrelated elements in a material object or system, or the object or system so organized. Material structures include man-made objects such as buildings and machines and natural objects such as ...
is much more evolutionarily conserved than sequence, such that proteins with highly similar structures can have entirely different sequences. Over very long evolutionary timescales, very few residues show detectable amino acid sequence conservation, however secondary structural elements and
tertiary structural Protein tertiary structure is the three-dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains and the b ...
motifs are highly conserved. Some
protein dynamics In molecular biology, proteins are generally thought to adopt unique structures determined by their amino acid sequences. However, proteins are not strictly static objects, but rather populate ensembles of (sometimes similar) conformations. Tran ...
and
conformational change In biochemistry, a conformational change is a change in the shape of a macromolecule, often induced by environmental factors. A macromolecule is usually flexible and dynamic. Its shape can change in response to changes in its environment or othe ...
s of the protein structure may also be conserved, as is seen in the serpin superfamily. Consequently, protein tertiary structure can be used to detect homology between proteins even when no evidence of relatedness remains in their sequences. Structural alignment programs, such as DALI, use the 3D structure of a protein of interest to find proteins with similar folds. However, on rare occasions, related proteins may evolve to be structurally dissimilar and relatedness can only be inferred by other methods.


Mechanistic similarity

The
catalytic mechanism Enzyme catalysis is the increase in the rate of a process by an "enzyme", a biological molecule. Most enzymes are proteins, and most such processes are chemical reactions. Within the enzyme, generally catalysis occurs at a localized site, calle ...
of enzymes within a superfamily is commonly conserved, although
substrate Substrate may refer to: Physical layers *Substrate (biology), the natural environment in which an organism lives, or the surface or medium on which an organism grows or is attached ** Substrate (aquatic environment), the earthy material that exi ...
specificity may be significantly different. Catalytic residues also tend to occur in the same order in the protein sequence. For the families within the PA clan of proteases, although there has been divergent evolution of the
catalytic triad A catalytic triad is a set of three coordinated amino acid residues that can be found in the active site of some enzymes. Catalytic triads are most commonly found in hydrolase and transferase enzymes (e.g. proteases, amidases, esterases, aminoac ...
residues used to perform catalysis, all members use a similar mechanism to perform covalent, nucleophilic catalysis on proteins, peptides or amino acids. However, mechanism alone is not sufficient to infer relatedness. Some catalytic mechanisms have been convergently evolved multiple times independently, and so form separate superfamilies, and in some superfamilies display a range of different (though often chemically similar) mechanisms.


Evolutionary significance

Protein superfamilies represent the current limits of our ability to identify common ancestry. They are the largest
evolutionary Evolution is the change in the heritable characteristics of biological populations over successive generations. It occurs when evolutionary processes such as natural selection and genetic drift act on genetic variation, resulting in certa ...
grouping based on direct
evidence Evidence for a proposition is what supports the proposition. It is usually understood as an indication that the proposition is truth, true. The exact definition and role of evidence vary across different fields. In epistemology, evidence is what J ...
that is currently possible. They are therefore amongst the most ancient evolutionary events currently studied. Some superfamilies have members present in all
kingdoms Kingdom commonly refers to: * A monarchic state or realm ruled by a king or queen. ** A monarchic chiefdom, represented or governed by a king or queen. * Kingdom (biology), a category in biological taxonomy Kingdom may also refer to: Arts and me ...
of
life Life, also known as biota, refers to matter that has biological processes, such as Cell signaling, signaling and self-sustaining processes. It is defined descriptively by the capacity for homeostasis, Structure#Biological, organisation, met ...
, indicating that the last common ancestor of that superfamily was in the
last universal common ancestor The last universal common ancestor (LUCA) is the hypothesized common ancestral cell from which the three domains of life, the Bacteria, the Archaea, and the Eukarya originated. The cell had a lipid bilayer; it possessed the genetic code a ...
of all life (LUCA). Superfamily members may be in different species, with the ancestral protein being the form of the protein that existed in the ancestral species ( orthology). Conversely, the proteins may be in the same species, but evolved from a single protein whose gene was duplicated in the genome (
paralogy Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a spec ...
).


Diversification

A majority of proteins contain multiple domains. Between 66-80% of eukaryotic proteins have multiple domains while about 40-60% of prokaryotic proteins have multiple domains. Over time, many of the superfamilies of domains have mixed together. In fact, it is very rare to find “consistently isolated superfamilies”. When domains do combine, the N- to C-terminal domain order (the "domain architecture") is typically well conserved. Additionally, the number of domain combinations seen in nature is small compared to the number of possibilities, suggesting that selection acts on all combinations.


Examples

; α/β hydrolase superfamily: Members share an α/β sheet, containing 8 strands connected by
helices A helix (; ) is a shape like a cylindrical coil spring or the thread of a machine screw. It is a type of smoothness (mathematics), smooth space curve with tangent lines at a constant angle to a fixed axis. Helices are important in biology, as ...
, with
catalytic triad A catalytic triad is a set of three coordinated amino acid residues that can be found in the active site of some enzymes. Catalytic triads are most commonly found in hydrolase and transferase enzymes (e.g. proteases, amidases, esterases, aminoac ...
residues in the same order, activities include
proteases A protease (also called a peptidase, proteinase, or proteolytic enzyme) is an enzyme that catalyzes proteolysis, breaking down proteins into smaller polypeptides or single amino acids, and spurring the formation of new protein products. They do ...
, lipases,
peroxidases Peroxidases or peroxide reductases ( EC numberbr>1.11.1.x are a large group of enzymes which play a role in various biological processes. They are named after the fact that they commonly break up peroxides, and should not be confused with other ...
, esterases, epoxide hydrolases and dehalogenases. ; Alkaline phosphatase superfamily: Members share an αβα sandwich structure as well as performing common promiscuous reactions by a common mechanism. ; Globin superfamily: Members share an 8-
alpha helix An alpha helix (or α-helix) is a sequence of amino acids in a protein that are twisted into a coil (a helix). The alpha helix is the most common structural arrangement in the Protein secondary structure, secondary structure of proteins. It is al ...
globular globin fold. ;
Immunoglobulin superfamily The immunoglobulin superfamily (IgSF) is a large protein superfamily of cell surface and soluble proteins that are involved in the recognition, binding, or adhesion processes of cells. Molecules are categorized as members of this superfamily ...
: Members share a sandwich-like structure of two sheets of antiparallel
β strand The beta sheet (β-sheet, also β-pleated sheet) is a common motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone hydrogen bonds, forming a gene ...
s ( Ig-fold), and are involved in recognition, binding, and
adhesion Adhesion is the tendency of dissimilar particles or interface (matter), surfaces to cling to one another. (Cohesion (chemistry), Cohesion refers to the tendency of similar or identical particles and surfaces to cling to one another.) The ...
. ; PA clan: Members share a
chymotrypsin Chymotrypsin (, chymotrypsins A and B, alpha-chymar ophth, avazyme, chymar, chymotest, enzeon, quimar, quimotrase, alpha-chymar, alpha-chymotrypsin A, alpha-chymotrypsin) is a digestive enzyme component of pancreatic juice acting in the duodenu ...
-like double
β-barrel In protein structures, a beta barrel (β barrel) is a beta sheet (β sheet) composed of Protein tandem repeats, tandem repeats that twists and coils to form a closed toroidal structure in which the first strand is bonded to the last strand (hydrog ...
fold and similar
proteolysis Proteolysis is the breakdown of proteins into smaller polypeptides or amino acids. Protein degradation is a major regulatory mechanism of gene expression and contributes substantially to shaping mammalian proteomes. Uncatalysed, the hydrolysis o ...
mechanisms but sequence identity of <10%. The clan contains both
cysteine Cysteine (; symbol Cys or C) is a semiessential proteinogenic amino acid with the chemical formula, formula . The thiol side chain in cysteine enables the formation of Disulfide, disulfide bonds, and often participates in enzymatic reactions as ...
and serine proteases (different nucleophiles). ;
Ras superfamily The Ras superfamily, derived from "Rat sarcoma virus", is a protein superfamily of small GTPases. Members of the superfamily are divided into families and subfamilies based on their structure, sequence and function. The five main families are Ra ...
: Members share a common catalytic G domain of a 6-strand β sheet surrounded by 5 α-helices. ; RSH superfamily: Members share capability to hydrolyze and/or synthesize ppGpp alarmones in the stringent response. ; Serpin superfamily: Members share a high-energy, stressed fold which can undergo a large
conformational change In biochemistry, a conformational change is a change in the shape of a macromolecule, often induced by environmental factors. A macromolecule is usually flexible and dynamic. Its shape can change in response to changes in its environment or othe ...
, which is typically used to inhibit
serine Serine (symbol Ser or S) is an α-amino acid that is used in the biosynthesis of proteins. It contains an α- amino group (which is in the protonated − form under biological conditions), a carboxyl group (which is in the deprotonated − ...
and cysteine proteases by disrupting their structure. ; TIM barrel superfamily: Members share a large α8β8 barrel structure. It is one of the most common
protein fold A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred (see homology). Usually this common ancestry is inferred from structural alignment and mechanistic similarity, even if no sequence similar ...
s and the monophylicity of this superfamily is still contested.


Protein superfamily resources

Several
biological databases Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including geno ...
document protein superfamilies and protein folds, for example: * Pfam - Protein families database of alignments and HMMs * PROSITE - Database of protein domains, families and functional sites * PIRSF - SuperFamily Classification System * PASS2 - Protein Alignment as Structural Superfamilies v2 * SUPERFAMILY - Library of HMMs representing superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms *
SCOP A ( or ) was a poet as represented in Old English poetry. The scop is the Old English counterpart of the Old Norse ', with the important difference that "skald" was applied to historical persons, and scop is used, for the most part, to designat ...
and
CATH Cath may refer to: __NOTOC__ People * Cath Bishop (born 1971), British former rower and 2003 world champion * Cath Carroll (born 1960), British musician and music journalist * Cath Coffey (), one of the earliest members of British rap band Stereo ...
- Classifications of protein structures into superfamilies, families and domains Similarly there are algorithms that search the PDB for proteins with structural homology to a target structure, for example: * DALI - Structural alignment based on a distance alignment matrix method


See also

* Structural alignment *
Protein domains In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of se ...
*
Protein family A protein family is a group of evolutionarily related proteins. In many cases, a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term "protein family" should not be ...
*
Protein subfamily Protein subfamily is a level of protein classification, based on their close evolutionary relationship. It is below the larger levels of protein superfamily and protein family. Proteins typically share greater sequence and function similarities w ...
* Protein mimetic *
Protein structure Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers specifically polypeptides formed from sequences of amino acids, which are the monomers of the polymer. A single amino acid ...
*
Homology (biology) In biology, homology is similarity in anatomical structures or genes between organisms of different taxa due to shared ancestry, ''regardless'' of current functional differences. Evolutionary biology explains homologous structures as retained her ...
*
Interolog An interolog is a conserved interaction between a pair of proteins which have interacting Homology (biology), homologs in another organism. The term was introduced in a 2000 paper by Walhout et al. Example Suppose that A and B are two differen ...
* List of gene families * SUPERFAMILY *
CATH Cath may refer to: __NOTOC__ People * Cath Bishop (born 1971), British former rower and 2003 world champion * Cath Carroll (born 1960), British musician and music journalist * Cath Coffey (), one of the earliest members of British rap band Stereo ...


References


External links

* {{Enzymes Molecular evolution * * Protein classification