Conserved signature indels
   HOME

TheInfoList



OR:

Conserved signature inserts and deletions (CSIs) in protein sequences provide an important category of molecular markers for understanding phylogenetic relationships. CSIs, brought about by rare genetic changes, provide useful phylogenetic markers that are generally of defined size and they are flanked on both sides by conserved regions to ensure their reliability. While
indel Indel is a molecular biology term for an insertion or deletion of bases in the genome of an organism. It is classified among small genetic variations, measuring from 1 to 10 000 base pairs in length, including insertion and deletion events that ...
s can be arbitrary inserts or deletions, CSIs are defined as only those protein indels that are present within conserved regions of the protein. The CSIs that are restricted to a particular
clade A clade (), also known as a monophyletic group or natural group, is a group of organisms that are monophyletic – that is, composed of a common ancestor and all its lineal descendants – on a phylogenetic tree. Rather than the English ter ...
or group of species, generally provide good phylogenetic markers of common evolutionary descent. Due to the rarity and highly specific nature of such changes, it is less likely that they could arise independently by either convergent or parallel evolution (i.e. homoplasy) and therefore are likely to represent
synapomorphy In phylogenetics, an apomorphy (or derived trait) is a novel character or character state that has evolved from its ancestral form (or plesiomorphy). A synapomorphy is an apomorphy shared by two or more taxa and is therefore hypothesized to hav ...
. Other confounding factors such as differences in evolutionary rates at different sites or among different species also generally do not affect the interpretation of a CSI. By determining the presence or absence of CSIs in an out-group species, one can infer whether the ancestral form of the CSI was an insert or deletion and this can be used to develop a rooted phylogenetic relationship among organisms. Most CSIs that have been identified have been found to exhibit high predictive value and they generally retain the specificity for the originally identified clades of species. Therefore, based upon their presence or absence, it should be possible to identify both known and even previously unknown species belonging to these groups in different environments.


Types


Group specific

Group specific CSIs are commonly shared by different species belonging to a particular
Taxon In biology, a taxon ( back-formation from '' taxonomy''; plural taxa) is a group of one or more populations of an organism or organisms seen by taxonomists to form a unit. Although neither is required, a taxon is usually known by a particular n ...
(e.g. genus, family, class, order, phylum) but they are not present in other groups. These CSIs were most likely introduced in an ancestor of the group of species before the members of the
taxa In biology, a taxon (back-formation from ''taxonomy''; plural taxa) is a group of one or more populations of an organism or organisms seen by taxonomists to form a unit. Although neither is required, a taxon is usually known by a particular nam ...
diverged. They provide molecular means for distinguishing members of a particular taxon from all other organisms. Figure 1 shows an example of 5aa CSI found in all species belonging to the taxon X. This is a distinctive characteristic of this taxon as it is not found in any other species. This signature was likely introduced in a common ancestor of the species from this taxon. Similarly other group-specific signatures (not shown) could be shared by either A1 and A2 or B1 and B2, etc., or even by X1 and X2 or by X3 and X4, etc. The groups A, B, C, D and X, in this diagram could correspond to various bacterial or
Eukaryotic Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bacte ...
phyla. Group specific CSIs have been used in the past to determine the phylogenetic relationship of a number of bacterial phyla and subgroups within it. For example a 3 amino acid insert was uniquely shared by members of the phylum Thermotogota (formerly Thermotogae) in the essential 50S ribosomal protein L7/L12, within a highly conserved region (82-124 amino acid). This is not present in any other bacteria species and could be used to characterize members of the phylum Thermotogota from all other bacteria. Group specific CSIs were also used to characterize subgroups within the phylum Thermotogota.


Multi group or main-line

Main-line CSIs are those in which a conserved insert or deletion is shared by several major phyla, but absent from other phyla. Figure 2 shows an example of 5aa CSI found in a conserved region that is commonly present in the species belonging to phyla X, Y and Z, but it is absent in other phyla (A, B and C). This signature indicates a specific relationship of taxa X, Y and Z and also A, B and C. Based upon the presence or absence of such an indel, in out-group species (viz. Archaea), it can be inferred whether the indel is an insert or a deletion, and which of these two groups A, B, C or X, Y, Z is ancestral. Main-line CSIs have been used in the past to determine the phylogenetic relationship of a number of bacterial phyla. The large CSI of about 150-180 amino acids within a conserved region of Gyrase B (between amino acids 529-751), is commonly shared between various
Pseudomonadota Pseudomonadota (synonym Proteobacteria) is a major phylum of Gram-negative bacteria. The renaming of phyla in 2021 remains controversial among microbiologists, many of whom continue to use the earlier names of long standing in the literature. Th ...
,
Chlamydiota The Chlamydiota (synonym Chlamydiae) are a bacterial phylum and class whose members are remarkably diverse, including pathogens of humans and animals, symbionts of ubiquitous protozoa, and marine sediment forms not yet well understood. All of ...
, Planctomycetota and Aquificota species. This CSI is absent in other ancestral bacterial phyla as well as
Archaea Archaea ( ; singular archaeon ) is a domain of single-celled organisms. These microorganisms lack cell nuclei and are therefore prokaryotes. Archaea were initially classified as bacteria, receiving the name archaebacteria (in the Archaeba ...
. Similarly a large CSI of about 100 amino acids in RpoB homologs (between amino acids 919-1058) is present in various species belonging to
Pseudomonadota Pseudomonadota (synonym Proteobacteria) is a major phylum of Gram-negative bacteria. The renaming of phyla in 2021 remains controversial among microbiologists, many of whom continue to use the earlier names of long standing in the literature. Th ...
,
Bacteroidota The phylum Bacteroidota (synonym Bacteroidetes) is composed of three large classes of Gram-negative, nonsporeforming, anaerobic or aerobic, and rod-shaped bacteria that are widely distributed in the environment, including in soil, sediments, and ...
,
Chlorobiota The green sulfur bacteria are a phylum of obligately anaerobic photoautotrophic bacteria that metabolize sulfur. Green sulfur bacteria are nonmotile (except ''Chloroherpeton thalassium'', which may glide) and capable of anoxygenic photosynthes ...
,
Chlamydiota The Chlamydiota (synonym Chlamydiae) are a bacterial phylum and class whose members are remarkably diverse, including pathogens of humans and animals, symbionts of ubiquitous protozoa, and marine sediment forms not yet well understood. All of ...
, Planctomycetota, and Aquificota. This CSI is absent in other ancestral bacterial phyla as well as
Archaea Archaea ( ; singular archaeon ) is a domain of single-celled organisms. These microorganisms lack cell nuclei and are therefore prokaryotes. Archaea were initially classified as bacteria, receiving the name archaebacteria (in the Archaeba ...
. In both cases one can infer that the groups lacking the CSI are ancestral.


Evolutionary studies based on CSIs

A key issue in bacterial phylogeny is to understand how different bacterial species are related to each other and their branching order from a common ancestor. Currently most phylogenetic trees are based on 16S rRNA or other genes/proteins. These trees are not always able to resolve key phylogenetic questions with a high degree of certainty. However in recent years the discovery and analyses of conserved indels (CSIs) in many universally distributed proteins have aided in this quest. The genetic events leading to them are postulated to have occurred at important evolutionary branch points and their species distribution patterns provide valuable information regarding the branching order and interrelationships among different bacterial phyla.


Thermotogota

Recently the phylogenetic relationship of the group Thermotogota was characterized based on the CSI approach. Previously no biochemical or molecular markers were known that could clearly distinguish the species of this phylum from all other bacteria. More than 60 CSIs that were specific for the entire Thermotogota phylum or its different subgroups were discovered. 18 CSIs are uniquely present in various Thermotogota species and provide molecular markers for the phylum. Additionally there were many CSIs that were specific for various Thermotogota subgroups. 12 CSIs were specific for a clade consisting of various Thermotogota species except Tt. Lettingae. 14CSIs were specific for a clade consisting of the Fervidobacterium and Thermosipho genera and 18 CSIs were specific for the genus Thermosiphon. Lastly 16 CSIs were reported that were shared by either some or all Thermotogota species or some species from other taxa such as
Archaea Archaea ( ; singular archaeon ) is a domain of single-celled organisms. These microorganisms lack cell nuclei and are therefore prokaryotes. Archaea were initially classified as bacteria, receiving the name archaebacteria (in the Archaeba ...
, Aquificota,
Bacillota The Bacillota (synonym Firmicutes) are a phylum of bacteria, most of which have gram-positive cell wall structure. The renaming of phyla such as Firmicutes in 2021 remains controversial among microbiologists, many of whom continue to use the earl ...
,
Pseudomonadota Pseudomonadota (synonym Proteobacteria) is a major phylum of Gram-negative bacteria. The renaming of phyla in 2021 remains controversial among microbiologists, many of whom continue to use the earlier names of long standing in the literature. Th ...
, Deinococcota, Fusobacteriota, Dictyoglomota,
Chloroflexota The Chloroflexota are a phylum of bacteria containing isolates with a diversity of phenotypes, including members that are aerobic thermophiles, which use oxygen and grow well in high temperatures; anoxygenic phototrophs, which use light for p ...
, and
eukaryotes Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bacter ...
. The shared presence of some of these CSIs could be due to lateral gene transfer (LGT) between these groups. However the number of CSIs that are commonly shared with other taxa is much smaller than those that are specific for Thermotogota and they do not exhibit any specific pattern. Hence they have no significant effect on the distinction of Thermotogota.


Archaea

Mesophillic
Thermoproteota The Thermoproteota (also known as crenarchaea) are archaea that have been classified as a phylum of the Archaea domain. Initially, the Thermoproteota were thought to be sulfur-dependent extremophiles but recent studies have identified characteris ...
were recently placed into a new phylum of
Archaea Archaea ( ; singular archaeon ) is a domain of single-celled organisms. These microorganisms lack cell nuclei and are therefore prokaryotes. Archaea were initially classified as bacteria, receiving the name archaebacteria (in the Archaeba ...
called the Nitrososphaerota (formerly Thaumarchaeota). However there are very few molecular markers that can distinguish this group of archaea from the phylum Thermoproteota (formerly Crenarchaeota). A detailed phylogenetic study using the CSI approach was conducted to distinguish these phyla in molecular terms. 6 CSIs were uniquely found in various Nitrososphaerota, namely '' Cenarchaeum symbiosum'', '' Nitrosopumilus maritimus'' and a number of uncultured marine Thermoproteota. 3 CSIs were found that were commonly shared between species belonging to Nitrososphaerota and Thermoproteota. Additionally, a number of CSIs were found that are specific for different orders of Thermoproteota—3 CSIs for Sulfolobales, 5 CSIs for Thermoproteales, lastly 2 CSIs common for Sulfolobales and Desulfurococcales. The signatures described provide novel means for distinguishing Thermoproteota and Nitrososphaerota, additionally they could be used as a tool for the classification and identification of related species.


Pasteurellales

The members of the order Pasteurellales are currently distinguished mainly based on their position in the branching of the 16srRNA tree. There are currently very few molecular markers known that can distinguish members of this order from other bacteria. A CSI approach was recently used to elucidate the phylogenetic relationships between the species in this order; more than 40 CSIs were discovered that were uniquely shared by all or most of the species. Two major clades are formed within this Pasteurellales: Clade I, encompassing Aggregatibacter, Pasteurella, Actinobacillus succinogenes, Mannheimia succiniciproducens,
Haemophilus influenzae ''Haemophilus influenzae'' (formerly called Pfeiffer's bacillus or ''Bacillus influenzae'') is a Gram-negative, non-motile, coccobacillary, facultatively anaerobic, capnophilic pathogenic bacterium of the family Pasteurellaceae. The bact ...
and Haemophilus somnus, was supported by 13 CSIs. Clade II, encompassing Actinobacillus pleuropneumoniae, Actinobacillus minor, Haemophilus ducreyi, Mannheimia haemolytica and Haemophilus parasuis, was supported by 9 CSIs. Based on these results, it was proposed that Pasteurellales be divided from its current one family into two different ones. Additionally, the signatures described would provide novel means of identifying undiscovered Pasteurellales species.


Gammaproteobacteria

The class
Gammaproteobacteria Gammaproteobacteria is a class of bacteria in the phylum Pseudomonadota (synonym Proteobacteria). It contains about 250 genera, which makes it the most genera-rich taxon of the Prokaryotes. Several medically, ecologically, and scientifically imp ...
forms one of the largest groups of bacteria. It is currently distinguished from other bacteria solely by 16s rRNA-based phylogenetic trees. No molecular characteristics unique to the class or its different subgroups are known. A detailed CSI-based study was conducted to better understand the phylogeny of this class. Firstly, a phylogenetic tree based on concatenated sequences of a number of universally-distributed proteins was created. The branching order of the different
orders Order, ORDER or Orders may refer to: * Categorization, the process in which ideas and objects are recognized, differentiated, and understood * Heterarchy, a system of organization wherein the elements have the potential to be ranked a number of ...
of the
class Class or The Class may refer to: Common uses not otherwise categorized * Class (biology), a taxonomic rank * Class (knowledge representation), a collection of individuals or objects * Class (philosophy), an analytical concept used differently ...
Gammaproteobacteria (from most recent to the earliest diverging) was: Enterobacteriales > Pasteurellales >
Vibrionales The Vibrionaceae are a family of Pseudomonadota given their own order, Vibrionales. Inhabitants of fresh or salt water, several species are pathogenic, including the type species ''Vibrio cholerae'', which is the agent responsible for cholera. Mo ...
, Aeromonadales > Alteromonadales >
Oceanospirillales The Oceanospirillales are an order of Pseudomonadota with ten families. Description Bacteria in the ''Oceanospirillales'' are metabolically and morphologically diverse, with some able to grow in the presence of oxygen and others requiring an an ...
,
Pseudomonadales The Pseudomonadales are an order of Pseudomonadota. A few members are pathogens, such as species of ''Pseudomonas'', ''Moraxella'', and ''Acinetobacter'', which may cause disease in humans, animals and plants. ''Pseudomonas'' The bacterial genu ...
> Chromatiales, Legionellales, Methylococcales, Xanthomonadales, Cardiobacteriales,
Thiotrichales The Thiotrichales are an order of Pseudomonadota, including '' Thiomargarita magnifica'', the largest known bacterium.George M. Garrity: ''Bergey's Manual of Systematic Bacteriology''. 2. Auflage. Springer, New York, 2005, Volume 2: ''The Proteob ...
. Additionally, 4 CSIs were discovered that were unique to most species of the class Gammaproteobacteria. A 2 aa deletion in AICAR transformylase was uniquely shared by all gammaproteobacteria except for
Francisella tularensis ''Francisella tularensis'' is a pathogenic species of Gram-negative coccobacillus, an aerobic bacterium. It is nonspore-forming, nonmotile, and the causative agent of tularemia, the pneumonic form of which is often lethal without treatment. It ...
. A 4 aa deletion in RNA polymerase b-subunit and a 1 aa deletion in ribosomal protein L16 were found uniquely in various species belonging to the orders Enterobacteriales, Pasteurellales,
Vibrionales The Vibrionaceae are a family of Pseudomonadota given their own order, Vibrionales. Inhabitants of fresh or salt water, several species are pathogenic, including the type species ''Vibrio cholerae'', which is the agent responsible for cholera. Mo ...
, Aeromonadales and Alteromonadales, but were not found in other gammaproteobacteria. Lastly, a 2 aa deletion in leucyl-tRNA synthetase was commonly present in the above orders of the class Gammaproteobacteria and in some members of the order
Oceanospirillales The Oceanospirillales are an order of Pseudomonadota with ten families. Description Bacteria in the ''Oceanospirillales'' are metabolically and morphologically diverse, with some able to grow in the presence of oxygen and others requiring an an ...
. Another CSI based study has also identified 4 CSIs that are exclusive to the order Xanthomonadales. Taken together, these two facts show that Xanthomonadales is a
monophyletic group A clade (), also known as a monophyletic group or natural group, is a group of organisms that are monophyletic – that is, composed of a common ancestor and all its lineal descendants – on a phylogenetic tree. Rather than the English term, ...
that is ancestral to other Gammaproteobacteria, which further shows that Xanthomonadales is an independent subdivision, and constitutes one of the deepest-branching lineages within the Gammaproteobacteria clade.


Fungi

The exact phylogenetic relationship between
plants Plants are predominantly photosynthetic eukaryotes of the kingdom Plantae. Historically, the plant kingdom encompassed all living things that were not animals, and included algae and fungi; however, all current definitions of Plantae exclude ...
,
animals Animals are multicellular, eukaryotic organisms in the biological kingdom Animalia. With few exceptions, animals consume organic material, breathe oxygen, are able to move, can reproduce sexually, and go through an ontogenetic stage in ...
and
fungi A fungus ( : fungi or funguses) is any member of the group of eukaryotic organisms that includes microorganisms such as yeasts and molds, as well as the more familiar mushrooms. These organisms are classified as a kingdom, separately fr ...
is not well understood. A small CSI-based study was conducted to elucidate this relationship. Four CSIs were used to place animals and fungi together as a monophyletic group, and exclude plants. These CSIs were found in two essential cellular proteins, elongation factor l and enolase. However, traditionally, this specific relationship between fungi and animals has not been supported.


References

{{reflist, 2 Molecular biology