Conserved signature inserts and deletions (CSIs) in protein sequences provide an important category of molecular markers for understanding phylogenetic relationships.
CSIs, brought about by rare genetic changes, provide useful phylogenetic markers that are generally of defined size and they are flanked on both sides by conserved regions to ensure their reliability. While
indel
Indel (insertion-deletion) is a molecular biology term for an insertion or deletion of bases in the genome of an organism. Indels ≥ 50 bases in length are classified as structural variants.
In coding regions of the genome, unless the lengt ...
s can be arbitrary inserts or deletions, CSIs are defined as only those protein indels that are present within conserved regions of the protein.
The CSIs that are restricted to a particular
clade
In biology, a clade (), also known as a Monophyly, monophyletic group or natural group, is a group of organisms that is composed of a common ancestor and all of its descendants. Clades are the fundamental unit of cladistics, a modern approach t ...
or group of species, generally provide good
phylogenetic
In biology, phylogenetics () is the study of the evolutionary history of life using observable characteristics of organisms (or genes), which is known as phylogenetic inference. It infers the relationship among organisms based on empirical dat ...
markers of common evolutionary descent.
Due to the rarity and highly specific nature of such changes, it is less likely that they could arise independently by either
convergent or
parallel evolution
Parallel evolution is the similar development of a trait in distinct species that are not closely related, but share a similar original trait in response to similar evolutionary pressure.Zhang, J. and Kumar, S. 1997Detection of convergent and pa ...
(i.e. homoplasy) and therefore are likely to represent
synapomorphy
In phylogenetics, an apomorphy (or derived trait) is a novel Phenotypic trait, character or character state that has evolution, evolved from its ancestral form (or Plesiomorphy and symplesiomorphy, plesiomorphy). A synapomorphy is an apomorphy sh ...
. Other confounding factors such as differences in evolutionary rates at different sites or among different species also generally do not affect the interpretation of a CSI.
By determining the presence or absence of CSIs in an out-group species, one can infer whether the ancestral form of the CSI was an insert or deletion and this can be used to develop a rooted phylogenetic relationship among organisms.
CSIs are discovered by looking for shared changes in a
phylogenetic tree
A phylogenetic tree or phylogeny is a graphical representation which shows the evolutionary history between a set of species or taxa during a specific time.Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA. In ...
constructed from protein sequences. Most CSIs that have been identified have been found to have high predictive value upon addition of new sequences, retaining the specificity for the originally identified clades of species. They can be used to identify both known and even previously unknown species belonging to these groups in different environments.
Compared to tree branching orders which can vary among methods, specific CSIs make for more concrete
circumscription
Circumscription may refer to:
* Circumscribed circle
* Circumscription (logic)
*Circumscription (taxonomy)
* Circumscription theory, a theory about the origins of the political state in the history of human evolution proposed by the American anthr ...
s that are computationally cheaper to apply.
Types
Group-specific

Group-specific CSIs are commonly shared by different species belonging to a particular
taxon
In biology, a taxon (back-formation from ''taxonomy''; : taxa) is a group of one or more populations of an organism or organisms seen by taxonomists to form a unit. Although neither is required, a taxon is usually known by a particular name and ...
(e.g. genus, family, class, order, phylum) but they are not present in other groups. These CSIs were most likely introduced in an ancestor of the group of species before the members of the taxa diverged. They provide molecular means for distinguishing members of a particular taxon from all other organisms.
Figure 1 shows an example of 5aa CSI found in all species belonging to the taxon X. This is a distinctive characteristic of this taxon as it is not found in any other species. This signature was likely introduced in a common ancestor of the species from this taxon. Similarly other group-specific signatures (not shown) could be shared by either A1 and A2 or B1 and B2, etc., or even by X1 and X2 or by X3 and X4, etc. The groups A, B, C, D and X, in this diagram could correspond to various bacterial or
Eukaryotic
The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
phyla.
Group-specific CSIs have been used in the past to determine the phylogenetic relationship of a number of bacterial phyla and subgroups within it. For example a 3 amino acid insert was uniquely shared by members of the phylum
Thermotogota (formerly Thermotogae) in the essential
50S ribosomal protein L7/L12, within a highly conserved region (82-124 amino acid). This is not present in any other bacteria species and could be used to characterize members of Thermotogota from all other bacteria. Group-specific CSIs were also used to characterize subgroups within Thermotogota.
Multi-group or mainline
Mainline CSIs are those in which a conserved insert or deletion is shared by several major phyla, but absent from other phyla.
Figure 2 shows an example of 5aa CSI found in a conserved region that is commonly present in the species belonging to phyla X, Y and Z, but it is absent in other phyla (A, B and C). This signature indicates a specific relationship of taxa X, Y and Z and also A, B and C. Based upon the presence or absence of such an indel, in out-group species (viz. Archaea), it can be inferred whether the indel is an insert or a deletion, and which of these two groups A, B, C or X, Y, Z is ancestral.
Mainline CSIs have been used in the past to determine the phylogenetic relationship of a number of bacterial phyla. The large CSI of about 150-180 amino acids within a conserved region of
Gyrase B (between amino acids 529-751), is commonly shared between various
Pseudomonadota
Pseudomonadota (synonym "Proteobacteria") is a major phylum of gram-negative bacteria. Currently, they are considered the predominant phylum within the domain of bacteria. They are naturally found as pathogenic and free-living (non- parasitic) ...
,
Chlamydiota
The Chlamydiota (synonym Chlamydiae) are a bacterial phylum and class whose members are remarkably diverse, including pathogens of humans and animals, symbionts of ubiquitous protozoa, and marine sediment forms not yet well understood. All of ...
,
Planctomycetota and
Aquificota
The ''Aquificota'' phylum (biology), phylum is a diverse collection of bacteria that live in harsh environmental settings. The name ''Aquificota'' was given to this phylum based on an early genus identified within this group, ''Aquifex'' (“wate ...
species. This CSI is absent in other ancestral bacterial phyla as well as
Archaea
Archaea ( ) is a Domain (biology), domain of organisms. Traditionally, Archaea only included its Prokaryote, prokaryotic members, but this has since been found to be paraphyletic, as eukaryotes are known to have evolved from archaea. Even thou ...
.
Similarly a large CSI of about 100 amino acids in
RpoB homologs (between amino acids 919-1058) is present in various species belonging to Pseudomonadota,
Bacteroidota
The phylum (biology), phylum Bacteroidota (synonym Bacteroidetes) is composed of three large classes of Gram-negative bacteria, Gram-negative, nonsporeforming, anaerobic or aerobic, and rod-shaped bacteria that are widely distributed in the envir ...
,
Chlorobiota,
Chlamydiota
The Chlamydiota (synonym Chlamydiae) are a bacterial phylum and class whose members are remarkably diverse, including pathogens of humans and animals, symbionts of ubiquitous protozoa, and marine sediment forms not yet well understood. All of ...
, Planctomycetota, and Aquificota. This CSI is absent in other ancestral bacterial phyla as well as Archaea.
In both cases one can infer that the groups lacking the CSI are ancestral.
Evolutionary studies based on CSIs

A key issue in bacterial phylogeny is to understand how different bacterial species are related to each other and their branching order from a common ancestor. Currently most phylogenetic trees are based on
16S rRNA
16S ribosomal RNA (or 16Svedberg, S rRNA) is the RNA component of the 30S subunit of a prokaryotic ribosome (SSU rRNA). It binds to the Shine-Dalgarno sequence and provides most of the SSU structure.
The genes coding for it are referred to as ...
or other genes/proteins. These trees are not always able to resolve key phylogenetic questions with a high degree of certainty.
However in recent years the discovery and analyses of conserved indels (CSIs) in many universally distributed proteins have aided in this quest. The genetic events leading to them are postulated to have occurred at important evolutionary branch points and their species distribution patterns provide valuable information regarding the branching order and interrelationships among different bacterial phyla.
Thermotogota
Recently the phylogenetic relationship of the group
Thermotogota was characterized based on the CSI approach. Previously no biochemical or
molecular markers were known that could clearly distinguish the species of this phylum from all other bacteria. More than 60 CSIs that were specific for the entire Thermotogota phylum or its different subgroups were discovered. Of these, 18 CSIs are uniquely present in various Thermotogota species and provide molecular markers for the phylum. Additionally there were many CSIs that were specific for various Thermotogota subgroups. Another 12 CSIs were specific for a clade consisting of various Thermotogota species except Tt. Lettingae. While 14 CSIs were specific for a clade consisting of the ''Fervidobacterium'' and ''Thermosipho'' genera and 18 CSIs were specific for the genus ''Thermosiphon''.
Lastly 16 CSIs were reported that were shared by either some or all Thermotogota species or some species from other taxa such as
Archaea
Archaea ( ) is a Domain (biology), domain of organisms. Traditionally, Archaea only included its Prokaryote, prokaryotic members, but this has since been found to be paraphyletic, as eukaryotes are known to have evolved from archaea. Even thou ...
,
Aquificota
The ''Aquificota'' phylum (biology), phylum is a diverse collection of bacteria that live in harsh environmental settings. The name ''Aquificota'' was given to this phylum based on an early genus identified within this group, ''Aquifex'' (“wate ...
,
Bacillota
The Bacillota (synonym Firmicutes) are a phylum of bacteria, most of which have Gram-positive cell wall structure. They have round cells, called cocci (singular coccus), or rod-like forms (bacillus). A few Bacillota, such as '' Megasphaera'', ...
,
Pseudomonadota
Pseudomonadota (synonym "Proteobacteria") is a major phylum of gram-negative bacteria. Currently, they are considered the predominant phylum within the domain of bacteria. They are naturally found as pathogenic and free-living (non- parasitic) ...
,
Deinococcota,
Fusobacteriota,
Dictyoglomota
''Dictyoglomus'' is a genus of bacterium, given its own phylum, called the Dictyoglomerota.Parte, A.C., Sardà Carbasse, J., Meier-Kolthoff, J.P., Reimer, L.C. and Göker, M. (2020). List of Prokaryotic names with Standing in Nomenclature (LPSN) m ...
,
Chloroflexota
The Chloroflexota are a phylum of bacteria containing isolates with a diversity of phenotypes, including members that are aerobic thermophiles, which use oxygen and grow well in high temperatures; anoxygenic phototrophs, which use light for ph ...
, and
eukaryotes
The eukaryotes ( ) constitute the domain of Eukaryota or Eukarya, organisms whose cells have a membrane-bound nucleus. All animals, plants, fungi, seaweeds, and many unicellular organisms are eukaryotes. They constitute a major group of ...
. The shared presence of some of these CSIs could be due to
lateral gene transfer (LGT) between these groups. However the number of CSIs that are commonly shared with other taxa is much smaller than those that are specific for Thermotogota and they do not exhibit any specific pattern. Hence they have no significant effect on the distinction of Thermotogota.
Archaea
Mesophillic
Thermoproteota
The Thermoproteota are prokaryotes that have been classified as a phylum (biology), phylum of the domain Archaea. Initially, the Thermoproteota were thought to be sulfur-dependent extremophiles but recent studies have identified characteristic T ...
were recently placed into a new phylum of
Archaea
Archaea ( ) is a Domain (biology), domain of organisms. Traditionally, Archaea only included its Prokaryote, prokaryotic members, but this has since been found to be paraphyletic, as eukaryotes are known to have evolved from archaea. Even thou ...
called the
Nitrososphaerota
The Nitrososphaerota (syn. Thaumarchaeota) are a phylum of the Archaea proposed in 2008 after the genome of '' Cenarchaeum symbiosum'' was sequenced and found to differ significantly from other members of the hyperthermophilic phylum Thermopr ...
(formerly Thaumarchaeota). However there are very few molecular markers that can distinguish this group of archaea from the phylum Thermoproteota (formerly Crenarchaeota). A detailed phylogenetic study using the CSI approach was conducted to distinguish these phyla in molecular terms. 6 CSIs were uniquely found in various Nitrososphaerota, namely ''
Cenarchaeum symbiosum'', ''
Nitrosopumilus maritimus'' and a number of uncultured marine Thermoproteota. 3 CSIs were found that were commonly shared between species belonging to Nitrososphaerota and Thermoproteota. Additionally, a number of CSIs were found that are specific for different orders of Thermoproteota—3 CSIs for
Sulfolobales
Sulfolobales is an order of archaeans in the class Thermoprotei.
Phylogeny
The currently accepted taxonomy is based on the List of Prokaryotic names with Standing in Nomenclature (LPSN) and National Center for Biotechnology Information (NCBI)
...
, 5 CSIs for
Thermoproteales
Thermoproteales are an order of archaeans in the class Thermoprotei. They are the only organisms known to lack the SSB proteins, instead possessing the protein ThermoDBP that has displaced them.
The rRNA genes of these organisms contain multi ...
, lastly 2 CSIs common for
Sulfolobales
Sulfolobales is an order of archaeans in the class Thermoprotei.
Phylogeny
The currently accepted taxonomy is based on the List of Prokaryotic names with Standing in Nomenclature (LPSN) and National Center for Biotechnology Information (NCBI)
...
and
Desulfurococcales
The Desulfurococcales is an order of the Thermoprotei, part of the kingdom Archaea. The order encompasses some genera which are all thermophilic, autotrophs which utilise chemical energy, typically by reducing sulfur compounds using hydrogen. ...
. The signatures described provide novel means for distinguishing Thermoproteota and Nitrososphaerota, additionally they could be used as a tool for the classification and identification of related species.
Pasteurellales
The members of the order
Pasteurellales
The Pasteurellaceae comprise a large family of Gram-negative bacteria. Most members live as commensals on mucosal surfaces of birds and mammals, especially in the upper respiratory tract. Pasteurellaceae are typically rod-shaped, and are a notabl ...
are currently distinguished mainly based on their position in the branching of the 16srRNA tree. There are currently very few molecular markers known that can distinguish members of this order from other bacteria. A CSI approach was recently used to elucidate the phylogenetic relationships between the species in this order; more than 40 CSIs were discovered that were uniquely shared by all or most of the species. Two major clades are formed within this Pasteurellales: Clade I, encompassing ''
Aggregatibacter'', ''
Pasteurella
__NOTOC__
''Pasteurella'' is a genus of Gram-negative, facultatively anaerobic bacteria. ''Pasteurella'' species are non motile and pleomorphic, and often exhibit bipolar staining ("safety pin" appearance). Most species are catalase- and oxidas ...
'', ''Actinobacillus succinogenes'', ''Mannheimia succiniciproducens'', ''
Haemophilus influenzae
''Haemophilus influenzae'' (formerly called Pfeiffer's bacillus or ''Bacillus influenzae'') is a Gram-negative, Motility, non-motile, Coccobacillus, coccobacillary, facultative anaerobic organism, facultatively anaerobic, Capnophile, capnophili ...
'' and ''Haemophilus somnus'', was supported by 13 CSIs. Clade II, encompassing ''Actinobacillus pleuropneumoniae'', ''Actinobacillus minor'', ''
Haemophilus ducreyi'', ''Mannheimia haemolytica'' and ''Haemophilus parasuis'', was supported by 9 CSIs. Based on these results, it was proposed that Pasteurellales be divided from its current one family into two different ones. Additionally, the signatures described would provide novel means of identifying undiscovered Pasteurellales species.
Gammaproteobacteria
The class
Gammaproteobacteria
''Gammaproteobacteria'' is a class of bacteria in the phylum ''Pseudomonadota'' (synonym ''Proteobacteria''). It contains about 250 genera, which makes it the most genus-rich taxon of the Prokaryotes. Several medically, ecologically, and scienti ...
forms one of the largest groups of bacteria. It is currently distinguished from other bacteria solely by
16s rRNA
16S ribosomal RNA (or 16Svedberg, S rRNA) is the RNA component of the 30S subunit of a prokaryotic ribosome (SSU rRNA). It binds to the Shine-Dalgarno sequence and provides most of the SSU structure.
The genes coding for it are referred to as ...
-based phylogenetic trees. No molecular characteristics unique to the class or its different subgroups are known. A detailed CSI-based study was conducted to better understand the phylogeny of this class. Firstly, a phylogenetic tree based on concatenated sequences of a number of universally-distributed proteins was created. The branching order of the different
orders
Order, ORDER or Orders may refer to:
* A socio-political or established or existing order, e.g. World order, Ancien Regime, Pax Britannica
* Categorization, the process in which ideas and objects are recognized, differentiated, and understood
* H ...
of the
class
Class, Classes, or The Class may refer to:
Common uses not otherwise categorized
* Class (biology), a taxonomic rank
* Class (knowledge representation), a collection of individuals or objects
* Class (philosophy), an analytical concept used d ...
Gammaproteobacteria (from most recent to the earliest diverging) was:
Enterobacteriales
Enterobacterales is an order of Gram-negative, non-spore forming, facultatively anaerobic, rod-shaped bacteria with the class Gammaproteobacteria. The type genus of this order is ''Enterobacter.''
The name Enterobacterales is derived from the ...
>
Pasteurellales
The Pasteurellaceae comprise a large family of Gram-negative bacteria. Most members live as commensals on mucosal surfaces of birds and mammals, especially in the upper respiratory tract. Pasteurellaceae are typically rod-shaped, and are a notabl ...
>
Vibrionales,
>
Alteromonadales >
Oceanospirillales
The Oceanospirillales are an order of Pseudomonadota with ten families.
Description
Bacteria in the Oceanospirillales are metabolically and morphologically diverse, with some able to grow in the presence of oxygen and others requiring an anaer ...
,
Pseudomonadales
The Pseudomonadales are an order of Pseudomonadota. A few members are pathogens, such as species of ''Pseudomonas'', ''Moraxella'', and '' Acinetobacter'', which may cause disease in humans, animals and plants.
''Pseudomonas''
The bacterial genu ...
>
Chromatiales,
Legionellales,
Methylococcales
The Methylococcaceae are a family of bacteria that obtain their carbon and energy from methane, called methanotrophs..
They comprise the type I methanotrophs, in contrast to the Methylocystaceae or type II methanotrophs. They belong to Gammapr ...
,
Xanthomonadales,
Cardiobacteriales
The Cardiobacteriaceae are a family of Pseudomonadota, given their own order. They are Gram-negative
Gram-negative bacteria are bacteria that, unlike gram-positive bacteria, do not retain the crystal violet stain used in the Gram staining m ...
,
Thiotrichales
Thiotrichales is an order of sulfur-oxidizing bacteria within the class Gammaproteobacteria known for their large size and ability to live in sulfur rich environments.
Characteristics
Thiotrichales has an important role in the sulfur and nitrog ...
. Additionally, 4 CSIs were discovered that were unique to most species of the class Gammaproteobacteria. A 2 aa deletion in
AICAR transformylase was uniquely shared by all gammaproteobacteria except for ''
Francisella tularensis
''Francisella tularensis'' is a pathogenic species of Gram-negative coccobacillus, an aerobic bacterium. It is nonspore-forming, nonmotile, and the causative agent of tularemia, the pneumonic form of which is often lethal without treatment. It i ...
''. A 4 aa deletion in
RNA polymerase b-subunit and a 1 aa deletion in
ribosomal protein L16 were found uniquely in various species belonging to the orders Enterobacteriales, Pasteurellales, Vibrionales, Aeromonadales and Alteromonadales, but were not found in other gammaproteobacteria. Lastly, a 2 aa deletion in
leucyl-tRNA synthetase
Leucyl-tRNA synthetase, cytoplasmic is an enzyme that in humans is encoded by the ''LARS'' gene.
Function
This gene encodes a cytosolic leucine-tRNA synthetase, a member of the class I aminoacyl-tRNA synthetase family. The encoded enzyme catal ...
was commonly present in the above orders of the class Gammaproteobacteria and in some members of the order Oceanospirillales.
Another CSI-based study has also identified 4 CSIs that are exclusive to the order Xanthomonadales. Taken together, these two facts show that Xanthomonadales is a
monophyletic group
In biology, a clade (), also known as a monophyletic group or natural group, is a group of organisms that is composed of a common ancestor and all of its descendants. Clades are the fundamental unit of cladistics, a modern approach to taxonomy ...
that is ancestral to other Gammaproteobacteria, which further shows that Xanthomonadales is an independent subdivision, and constitutes one of the deepest-branching lineages within the Gammaproteobacteria clade.
Complications
Horizontal gene transfer
The relative rarity and specificity of CSIs do not preclude the possibility of horizontal gene transfer in the genes they are found in. In fact, CSIs have provided evidence for some cases of horizontal gene transfer.
Convergent evolution
Although CSIs are much less to evolve convergently compared to other shared characteristics, it remains possible to produce them via convergent evolution. One example is the 51aa insertion in the SecA translocase shared between the thermophilic
Thermotogales and
Aquificales. Full phylogenetic comparison suggests that the shared insertion had evolved convergently rather than from lateral transfer.
See also
*
Molecular phylogenetics
Molecular phylogenetics () is the branch of phylogeny that analyzes genetic, hereditary molecular differences, predominantly in DNA sequences, to gain information on an organism's evolutionary relationships. From these analyses, it is possible to ...
References
{{reflist, 25em
Molecular biology