HOME

TheInfoList



OR:

Molecular phylogenetics () is the branch of phylogeny that analyzes genetic, hereditary molecular differences, predominantly in DNA sequences, to gain information on an organism's evolutionary relationships. From these analyses, it is possible to determine the processes by which diversity among species has been achieved. The result of a molecular
phylogenetic In biology, phylogenetics (; from Greek φυλή/ φῦλον [] "tribe, clan, race", and wikt:γενετικός, γενετικός [] "origin, source, birth") is the study of the evolutionary history and relationships among or within groups o ...
analysis is expressed in a phylogenetic tree. Molecular phylogenetics is one aspect of molecular systematics, a broader term that also includes the use of molecular data in taxonomy and biogeography. Molecular phylogenetics and molecular evolution correlate. Molecular evolution is the process of selective changes (mutations) at a molecular level (genes, proteins, etc.) throughout various branches in the tree of life (evolution). Molecular phylogenetics makes inferences of the evolutionary relationships that arise due to molecular evolution and results in the construction of a phylogenetic tree.


History

The theoretical frameworks for molecular systematics were laid in the 1960s in the works of Emile Zuckerkandl, Emanuel Margoliash, Linus Pauling, and Walter M. Fitch. Applications of molecular systematics were pioneered by Charles G. Sibley (
bird Birds are a group of warm-blooded vertebrates constituting the class Aves (), characterised by feathers, toothless beaked jaws, the laying of hard-shelled eggs, a high metabolic rate, a four-chambered heart, and a strong yet lightweig ...
s),
Herbert C. Dessauer Herbert Clay Dessauer (30 December 1921 – 8 February 2013) was an American biochemist, and a pioneer in the use of molecular systematics to clarify the evolutionary relationships of anole Dactyloidae are a family of lizards commonly kno ...
( herpetology), and Morris Goodman (
primate Primates are a diverse order of mammals. They are divided into the strepsirrhines, which include the lemurs, galagos, and lorisids, and the haplorhines, which include the tarsiers and the simians ( monkeys and apes, the latter including ...
s), followed by Allan C. Wilson, Robert K. Selander, and John C. Avise (who studied various groups). Work with protein electrophoresis began around 1956. Although the results were not quantitative and did not initially improve on morphological classification, they provided tantalizing hints that long-held notions of the classifications of
bird Birds are a group of warm-blooded vertebrates constituting the class Aves (), characterised by feathers, toothless beaked jaws, the laying of hard-shelled eggs, a high metabolic rate, a four-chambered heart, and a strong yet lightweig ...
s, for example, needed substantial revision. In the period of 1974–1986, DNA-DNA hybridization was the dominant technique used to measure genetic difference.


Theoretical background

Early attempts at molecular systematics were also termed as
chemotaxonomy Merriam-Webster defines ''chemotaxonomy'' as the method of biological classification based on similarities and dissimilarity in the structure of certain compounds among the organisms being classified. Advocates argue that, as proteins are more cl ...
and made use of proteins,
enzyme Enzymes () are proteins that act as biological catalysts by accelerating chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as products ...
s,
carbohydrate In organic chemistry, a carbohydrate () is a biomolecule consisting of carbon (C), hydrogen (H) and oxygen (O) atoms, usually with a hydrogen–oxygen atom ratio of 2:1 (as in water) and thus with the empirical formula (where ''m'' may or m ...
s, and other molecules that were separated and characterized using techniques such as
chromatography In chemical analysis, chromatography is a laboratory technique for the separation of a mixture into its components. The mixture is dissolved in a fluid solvent (gas or liquid) called the ''mobile phase'', which carries it through a system ( ...
. These have been replaced in recent times largely by DNA sequencing, which produces the exact sequences of
nucleotides Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecules with ...
or ''bases'' in either DNA or RNA segments extracted using different techniques. In general, these are considered superior for evolutionary studies, since the actions of evolution are ultimately reflected in the genetic sequences. At present, it is still a long and expensive process to sequence the entire DNA of an organism (its
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding g ...
). However, it is quite feasible to determine the sequence of a defined area of a particular
chromosome A chromosome is a long DNA molecule with part or all of the genetic material of an organism. In most chromosomes the very long thin DNA fibers are coated with packaging proteins; in eukaryotic cells the most important of these proteins are ...
. Typical molecular systematic analyses require the sequencing of around 1000 base pairs. At any location within such a sequence, the bases found in a given position may vary between organisms. The particular sequence found in a given organism is referred to as its haplotype. In principle, since there are four base types, with 1000 base pairs, we could have 41000 distinct haplotypes. However, for organisms within a particular species or in a group of related species, it has been found empirically that only a minority of sites show any variation at all, and most of the variations that are found are correlated, so that the number of distinct haplotypes that are found is relatively small. In a molecular systematic analysis, the haplotypes are determined for a defined area of genetic material; a substantial sample of individuals of the target
species In biology, a species is the basic unit of classification and a taxonomic rank of an organism, as well as a unit of biodiversity. A species is often defined as the largest group of organisms in which any two individuals of the appropriate s ...
or other
taxon In biology, a taxon ( back-formation from '' taxonomy''; plural taxa) is a group of one or more populations of an organism or organisms seen by taxonomists to form a unit. Although neither is required, a taxon is usually known by a particular n ...
is used; however, many current studies are based on single individuals. Haplotypes of individuals of closely related, yet different, taxa are also determined. Finally, haplotypes from a smaller number of individuals from a definitely different taxon are determined: these are referred to as an outgroup. The base sequences for the haplotypes are then compared. In the simplest case, the difference between two haplotypes is assessed by counting the number of locations where they have different bases: this is referred to as the number of ''substitutions'' (other kinds of differences between haplotypes can also occur, for example, the ''insertion'' of a section of nucleic acid in one haplotype that is not present in another). The difference between organisms is usually re-expressed as a ''percentage divergence'', by dividing the number of substitutions by the number of base pairs analysed: the hope is that this measure will be independent of the location and length of the section of DNA that is sequenced. An older and superseded approach was to determine the divergences between the genotypes of individuals by DNA-DNA hybridization. The advantage claimed for using hybridization rather than gene sequencing was that it was based on the entire genotype, rather than on particular sections of DNA. Modern sequence comparison techniques overcome this objection by the use of multiple sequences. Once the divergences between all pairs of samples have been determined, the resulting
triangular matrix In mathematics, a triangular matrix is a special kind of square matrix. A square matrix is called if all the entries ''above'' the main diagonal are zero. Similarly, a square matrix is called if all the entries ''below'' the main diagonal are ...
of differences is submitted to some form of statistical cluster analysis, and the resulting dendrogram is examined in order to see whether the samples cluster in the way that would be expected from current ideas about the taxonomy of the group. Any group of haplotypes that are all more similar to one another than any of them is to any other haplotype may be said to constitute a clade, which may be visually represented as the figure displayed on the right demonstrates. Statistical techniques such as
bootstrapping In general, bootstrapping usually refers to a self-starting process that is supposed to continue or grow without external input. Etymology Tall boots may have a tab, loop or handle at the top known as a bootstrap, allowing one to use fingers ...
and
jackknifing Jackknifing is the folding of an articulated vehicle so that it resembles the acute angle of a folding pocket knife. If a vehicle towing a trailer skids, the trailer can push the towing vehicle from behind until it spins the vehicle around and ...
help in providing reliability estimates for the positions of haplotypes within the evolutionary trees.


Techniques and applications

Every living
organism In biology, an organism () is any living system that functions as an individual entity. All organisms are composed of cells (cell theory). Organisms are classified by taxonomy into groups such as multicellular animals, plants, and ...
contains deoxyribonucleic acid ( DNA), ribonucleic acid ( RNA), and
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...
s. In general, closely related organisms have a high degree of similarity in the molecular structure of these substances, while the molecules of organisms distantly related often show a pattern of dissimilarity. Conserved sequences, such as mitochondrial DNA, are expected to accumulate mutations over time, and assuming a constant rate of mutation, provide a molecular clock for dating divergence. Molecular phylogeny uses such data to build a "relationship tree" that shows the probable
evolution Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation ...
of various organisms. With the invention of Sanger sequencing in 1977, it became possible to isolate and identify these molecular structures.
High-throughput sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Th ...
may also be used to obtain the transcriptome of an organism, allowing inference of phylogenetic relationships using transcriptomic data. The most common approach is the comparison of
homologous sequence Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a s ...
s for genes using sequence alignment techniques to identify similarity. Another application of molecular phylogeny is in
DNA barcoding DNA barcoding is a method of species identification using a short section of DNA from a specific gene or genes. The premise of DNA barcoding is that by comparison with a reference library of such DNA sections (also called " sequences"), an indi ...
, wherein the species of an individual organism is identified using small sections of mitochondrial DNA or
chloroplast DNA Chloroplast DNA (cpDNA) is the DNA located in chloroplasts, which are photosynthetic organelles located within the cells of some eukaryotic organisms. Chloroplasts, like other types of plastid, contain a genome separate from that in the cell n ...
. Another application of the techniques that make this possible can be seen in the very limited field of human genetics, such as the ever-more-popular use of genetic testing to determine a child's paternity, as well as the emergence of a new branch of criminal forensics focused on evidence known as
genetic fingerprinting DNA profiling (also called DNA fingerprinting) is the process of determining an individual's DNA characteristics. DNA analysis intended to identify a species, rather than an individual, is called DNA barcoding. DNA profiling is a forensic t ...
.


Molecular phylogenetic analysis

There are several methods available for performing a molecular phylogenetic analysis. One method, including a comprehensive step-by-step protocol on constructing a phylogenetic tree, including DNA/Amino Acid contiguous sequence assembly,
multiple sequence alignment Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutio ...
, model-test (testing best-fitting substitution models), and phylogeny reconstruction using Maximum Likelihood and Bayesian Inference, is available at Nature Protocol. Another molecular phylogenetic analysis technique has been described by Pevsner and shall be summarized in the sentences to follow (Pevsner, 2015). A phylogenetic analysis typically consists of five major steps. The first stage comprises sequence acquisition. The following step consists of performing a multiple sequence alignment, which is the fundamental basis of constructing a phylogenetic tree. The third stage includes different models of DNA and amino acid substitution. Several models of substitution exist. A few examples include
Hamming distance In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chan ...
, the Jukes and Cantor one-parameter model, and the Kimura two-parameter model (see Models of DNA evolution). The fourth stage consists of various methods of tree building, including distance-based and character-based methods. The normalized Hamming distance and the Jukes-Cantor correction formulas provide the degree of divergence and the probability that a nucleotide changes to another, respectively. Common tree-building methods include unweighted pair group method using arithmetic mean ( UPGMA) and
Neighbor joining In bioinformatics, neighbor joining is a bottom-up (agglomerative) clustering method for the creation of phylogenetic trees, created by Naruya Saitou and Masatoshi Nei in 1987. Usually based on DNA or protein sequence data, the algorithm requir ...
, which are distance-based methods, Maximum parsimony, which is a character-based method, and
Maximum likelihood estimation In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
and Bayesian inference, which are character-based/model-based methods. UPGMA is a simple method; however, it is less accurate than the neighbor-joining approach. Finally, the last step comprises evaluating the trees. This assessment of accuracy is composed of consistency, efficiency, and robustness. MEGA (molecular evolutionary genetics analysis) is an analysis software that is user-friendly and free to download and use. This software is capable of analyzing both distance-based and character-based tree methodologies. MEGA also contains several options one may choose to utilize, such as heuristic approaches and bootstrapping.
Bootstrapping In general, bootstrapping usually refers to a self-starting process that is supposed to continue or grow without external input. Etymology Tall boots may have a tab, loop or handle at the top known as a bootstrap, allowing one to use fingers ...
is an approach that is commonly used to measure the robustness of topology in a phylogenetic tree, which demonstrates the percentage each clade is supported after numerous replicates. In general, a value greater than 70% is considered significant. The flow chart displayed on the right visually demonstrates the order of the five stages of Pevsner's molecular phylogenetic analysis technique that have been described.


Limitations

Molecular systematics is an essentially
cladistic Cladistics (; ) is an approach to biological classification in which organisms are categorized in groups ("clades") based on hypotheses of most recent common ancestry. The evidence for hypothesized relationships is typically shared derived char ...
approach: it assumes that classification must correspond to phylogenetic descent, and that all valid taxa must be monophyletic. This is a limitation when attempting to determine the optimal tree(s), which often involves bisecting and reconnecting portions of the phylogenetic tree(s). The recent discovery of extensive
horizontal gene transfer Horizontal gene transfer (HGT) or lateral gene transfer (LGT) is the movement of genetic material between unicellular and/or multicellular organisms other than by the ("vertical") transmission of DNA from parent to offspring (reproduction). H ...
among organisms provides a significant complication to molecular systematics, indicating that different genes within the same organism can have different phylogenies. In addition, molecular phylogenies are sensitive to the assumptions and models that go into making them. Firstly, sequences must be aligned; then, issues such as long-branch attraction,
saturation Saturation, saturated, unsaturation or unsaturated may refer to: Chemistry * Saturation, a property of organic compounds referring to carbon-carbon bonds **Saturated and unsaturated compounds ** Degree of unsaturation **Saturated fat or fatty aci ...
, and
taxon In biology, a taxon ( back-formation from '' taxonomy''; plural taxa) is a group of one or more populations of an organism or organisms seen by taxonomists to form a unit. Although neither is required, a taxon is usually known by a particular n ...
sampling problems must be addressed. This means that strikingly different results can be obtained by applying different models to the same dataset. Moreover, as previously mentioned, UPGMA is a simple approach in which the tree is always rooted. The algorithm assumes a constant molecular clock for sequences in the tree. This is associated with being a limitation in that if unequal substitution rates exist, the result may be an incorrect tree.


See also

*
Computational phylogenetics Computational phylogenetics is the application of computational algorithms, methods, and programs to phylogenetic
*Microbial phylogenetics *Molecular clock *Molecular evolution *PhyloCode *Phylogenetic nomenclature


Notes and references


Further reading

* *


External links


NCBI – Systematics and Molecular PhylogeneticsMEGA Software
*Molecular phylogenetics
/span> from ''Encyclopædia Britannica''. {{Bioinformatics Phylogenetics Molecular evolution