Phylogenomics is the intersection of the fields of
evolution
Evolution is the change in the heritable Phenotypic trait, characteristics of biological populations over successive generations. It occurs when evolutionary processes such as natural selection and genetic drift act on genetic variation, re ...
and
genomics
Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, ...
. The term has been used in multiple ways to refer to analysis that involves
genome
A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
data and evolutionary reconstructions. It is a group of techniques within the larger fields of
phylogenetics
In biology, phylogenetics () is the study of the evolutionary history of life using observable characteristics of organisms (or genes), which is known as phylogenetic inference. It infers the relationship among organisms based on empirical dat ...
and genomics. Phylogenomics draws information by comparing entire genomes, or at least large portions of genomes. Phylogenetics compares and analyzes the sequences of single genes, or a small number of genes, as well as many other types of data. Four major areas fall under phylogenomics:
* Prediction of gene function
* Establishment and clarification of evolutionary relationships
* Gene family evolution
* Prediction and retracing
lateral gene transfer.
The ultimate goal of phylogenomics is to reconstruct the evolutionary history of species through their genomes. This history is usually inferred from a series of genomes by using a genome evolution model and standard statistical inference methods (e.g.
Bayesian inference
Bayesian inference ( or ) is a method of statistical inference in which Bayes' theorem is used to calculate a probability of a hypothesis, given prior evidence, and update it as more information becomes available. Fundamentally, Bayesian infer ...
or
maximum likelihood estimation
In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
).
Prediction of gene function
When
Jonathan Eisen originally coined ''phylogenomics'', it applied to prediction of gene function. Before the use of phylogenomic techniques, predicting gene function was done primarily by comparing the gene sequence with the sequences of genes with known functions. When several genes with similar sequences but differing functions are involved, this method alone is ineffective in determining function. A specific example is presented in the paper "Gastronomic Delights: A movable feast".
Gene predictions based on sequence similarity alone had been used to predict that ''
Helicobacter pylori
''Helicobacter pylori'', previously known as ''Campylobacter pylori'', is a gram-negative, Flagellum#bacterial, flagellated, Bacterial cellular morphologies#Helical, helical bacterium. Mutants can have a rod or curved rod shape that exhibits l ...
'' can repair mismatched
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
.
This prediction was based on the fact that this organism has a gene for which the sequence is highly similar to genes from other species in the "MutS" gene family which included many known to be involved in mismatch repair. However, Eisen noted that ''H. pylori'' lacks other genes thought to be essential for this function (specifically, members of the MutL family). Eisen suggested a solution to this apparent discrepancy – phylogenetic trees of genes in the MutS family revealed that the gene found in ''H. pylori'' was not in the same subfamily as those known to be involved in mismatch repair.
Furthermore, he suggested that this "phylogenomic" approach could be used as a general method for prediction functions of genes. This approach was formally described in 1998. For reviews of this aspect of phylogenomics see Brown D, Sjölander K. Functional classification using phylogenomic inference.
Prediction and retracing lateral gene transfer
Traditional phylogenetic techniques have difficulty establishing differences between genes that are similar because of lateral gene transfer and those that are similar because the organisms shared an ancestor. By comparing large numbers of genes or entire genomes among many species, it is possible to identify transferred genes, since these sequences behave differently from what is expected given the
taxonomy
image:Hierarchical clustering diagram.png, 280px, Generalized scheme of taxonomy
Taxonomy is a practice and science concerned with classification or categorization. Typically, there are two parts to it: the development of an underlying scheme o ...
of the organism. Using these methods, researchers were able to identify over 2,000 metabolic enzymes obtained by various eukaryotic parasites from lateral gene transfer.
Gene family evolution
The comparison of complete gene sets for a group of organisms allows the identification of events in gene evolution such as
gene duplication
Gene duplication (or chromosomal duplication or gene amplification) is a major mechanism through which new genetic material is generated during molecular evolution. It can be defined as any duplication of a region of DNA that contains a gene ...
or
gene deletion. Often, such events are evolutionarily relevant. For example, multiple duplications of genes encoding degradative enzymes of certain families is a common adaptation in microbes to new nutrient sources. On the contrary, loss of genes is important in
reductive evolution, such as in intracellular parasites or symbionts.
Whole genome duplication events, which potentially duplicate all the genes in a genome at once, are drastic evolutionary events with great relevance in the evolution of many clades, and whose signal can be traced with phylogenomic methods.
Establishment of evolutionary relationships
Traditional single-gene studies are effective in establishing phylogenetic trees among closely related organisms, but have drawbacks when comparing more distantly related organisms or microorganisms. This is because of
lateral gene transfer,
convergence
Convergence may refer to:
Arts and media Literature
*''Convergence'' (book series), edited by Ruth Nanda Anshen
*Convergence (comics), "Convergence" (comics), two separate story lines published by DC Comics:
**A four-part crossover storyline that ...
, and varying rates of evolution for different genes. By using entire genomes in these comparisons, the anomalies created from these factors are overwhelmed by the pattern of evolution indicated by the majority of the data. Using this method, it is theoretically possible to create fully resolved phylogenetic trees, and timing constraints can be recovered more accurately.
However, in practice this is not always the case. Due to insufficient data, multiple trees can sometimes be supported by the same data when analyzed using different methods.
Notable results of phylogenomics (in the sense of massive
multigene phylogenies):
* Using 135 genes from 65 different
species
A species () is often defined as the largest group of organisms in which any two individuals of the appropriate sexes or mating types can produce fertile offspring, typically by sexual reproduction. It is the basic unit of Taxonomy (biology), ...
of photosynthetic organisms, it has been discovered that most of the photosynthetic eukaryotes are linked and possibly share a single ancestor. These included
plants
Plants are the eukaryotes that form the kingdom Plantae; they are predominantly photosynthetic. This means that they obtain their energy from sunlight, using chloroplasts derived from endosymbiosis with cyanobacteria to produce sugars f ...
,
alveolates,
rhizaria
The Rhizaria are a diverse and species-rich clade of mostly unicellular eukaryotes. Except for the Chlorarachniophytes and three species in the genus '' Paulinella'' in the phylum Cercozoa, they are all non-photosynthetic, but many Foraminifera ...
ns,
haptophytes and
cryptomonads. This has been referred to as the
Plants+HC+SAR megagroup. This study concatenates these genes together in what's called a "supermatrix" approach.
* The root of the bacterial tree of life and the extent of horizontal gene transfer was determined by tracing the evolution of 11,272 gene families. This is a "supertree" approach.
* The root of the archaeal tree of life was determined using a 45-protein supermatrix analysis and a 3242-protein supertree analysis. The 31,236 gene families in archaea are then put on the tree to determine what the ancestral archaea may have.
* Using 120 proteins from bacteria or 53 proteins from archaea (supermatrix), the
Genome Taxonomy Database
The Genome Taxonomy Database (GTDB) is an online database that maintains information on a proposed nomenclature of prokaryotes, following a phylogenomic approach based on a set of conserved single-copy proteins. In addition to resolving parap ...
generates a taxonomy of all bacteria and archaea with high-quality sequenced genomes.
Databases
#
PhylomeDB
See also
* ''
Archaeopteryx
''Archaeopteryx'' (; ), sometimes referred to by its German name, "" ( ''Primeval Bird'') is a genus of bird-like dinosaurs. The name derives from the ancient Greek (''archaîos''), meaning "ancient", and (''ptéryx''), meaning "feather" ...
'' (phylogenomics software)
*
Microbial phylogenetics
*
Phylogenetics
In biology, phylogenetics () is the study of the evolutionary history of life using observable characteristics of organisms (or genes), which is known as phylogenetic inference. It infers the relationship among organisms based on empirical dat ...
*
Sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural biology, structural, or evolutionary relationships between ...
*
Supertree
References
Further reading
*
* (compares RAxML/ExaML, PhyML, IQ-TREE, and FastTree)
{{Phylogenetics
Genomics
Phylogenetics