Long-branch Attraction
   HOME

TheInfoList



OR:

In
phylogenetics In biology, phylogenetics () is the study of the evolutionary history of life using observable characteristics of organisms (or genes), which is known as phylogenetic inference. It infers the relationship among organisms based on empirical dat ...
, long branch attraction (LBA) is a form of systematic error whereby distantly related lineages are incorrectly inferred to be closely related. LBA arises when the amount of molecular or morphological change accumulated within a lineage is sufficient to cause that lineage to appear similar (thus closely related) to another long-branched lineage, solely because they have both undergone a large amount of change, rather than because they are related by descent. Such bias is more common when the overall divergence of some taxa results in long branches within a
phylogeny A phylogenetic tree or phylogeny is a graphical representation which shows the evolutionary history between a set of species or Taxon, taxa during a specific time.Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, M ...
. Long branches are often attracted to the base of a
phylogenetic tree A phylogenetic tree or phylogeny is a graphical representation which shows the evolutionary history between a set of species or taxa during a specific time.Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA. In ...
, because the lineage included to represent an outgroup is often also long-branched. The frequency of true LBA is unclear and often debated, and some authors view it as untestable and therefore irrelevant to empirical phylogenetic inference. Although often viewed as a failing of parsimony-based methodology, LBA could in principle result from a variety of scenarios and be inferred under multiple analytical paradigms.


Causes

LBA was first recognized as problematic when analyzing discrete morphological character sets under parsimony criteria, however Maximum Likelihood analyses of DNA or protein sequences are also susceptible. A simple hypothetical example can be found in Felsenstein 1978 where it is demonstrated that for certain unknown "true" trees, some methods can show bias for grouping long branches, ultimately resulting in the inference of a false sister relationship. Often this is because
convergent evolution Convergent evolution is the independent evolution of similar features in species of different periods or epochs in time. Convergent evolution creates analogous structures that have similar form or function but were not present in the last comm ...
of one or more characters included in the analysis has occurred in multiple taxa. Although they were derived independently, these shared traits can be misinterpreted in the analysis as being shared due to common ancestry. In
phylogenetic In biology, phylogenetics () is the study of the evolutionary history of life using observable characteristics of organisms (or genes), which is known as phylogenetic inference. It infers the relationship among organisms based on empirical dat ...
and clustering analyses, LBA is a result of the way clustering algorithms work: terminals or taxa with many autapomorphies (character states unique to a single branch) may by chance exhibit the same states as those on another branch ( homoplasy). A phylogenetic analysis will group these taxa together as a clade unless other
synapomorphies In phylogenetics, an apomorphy (or derived trait) is a novel character or character state that has evolved from its ancestral form (or plesiomorphy). A synapomorphy is an apomorphy shared by two or more taxa and is therefore hypothesized to ...
outweigh the homoplastic features to group together true sister taxa. These problems may be minimized by using methods that correct for multiple substitutions at the same site, by adding taxa related to those with the long branches that add additional true synapomorphies to the data, or by using alternative slower evolving traits (e.g. more conservative gene regions).


Results

The result of LBA in
evolution Evolution is the change in the heritable Phenotypic trait, characteristics of biological populations over successive generations. It occurs when evolutionary processes such as natural selection and genetic drift act on genetic variation, re ...
ary analyses is that rapidly evolving lineages may be inferred to be sister taxa, regardless of their true relationships. For example, in
DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
sequence-based analyses, the problem arises when sequences from two (or more) lineages evolve rapidly. There are only four possible
nucleotide Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
s and when DNA substitution rates are high, the probability that two lineages will evolve the same nucleotide at the same site increases. When this happens, a phylogenetic analysis may erroneously interpret this homoplasy as a
synapomorphy In phylogenetics, an apomorphy (or derived trait) is a novel Phenotypic trait, character or character state that has evolution, evolved from its ancestral form (or Plesiomorphy and symplesiomorphy, plesiomorphy). A synapomorphy is an apomorphy sh ...
(i.e., evolving once in the common ancestor of the two lineages). The opposite effect may also be observed, in that if two (or more) branches exhibit particularly slow evolution among a wider, fast evolving group, those branches may be misinterpreted as closely related. As such, "long branch attraction" can in some ways be better expressed as "branch length attraction". However, it is typically long branches that exhibit attraction. The recognition of long-branch attraction implies that there is some other evidence that suggests that the phylogeny is incorrect. For example, two different sources of data (i.e. molecular and morphological) or even different methods or partition schemes might support different placement for the long-branched groups. Hennig's Auxiliary Principle suggests that synapomorphies should be viewed as de facto evidence of grouping unless there is specific contrary evidence (Hennig, 1966; Schuh and Brower, 2009). A simple and effective method for determining whether or not long branch attraction is affecting tree topology is the SAW method, named for Siddal and Whiting. If long branch attraction is suspected between a pair of taxa (A and B), simply remove taxon A ("saw" off the branch) and re-run the analysis. Then remove B and replace A, running the analysis again. If either of the taxa appears at a different branch point in the absence of the other, there is evidence of long branch attraction. Since long branches can't possibly attract one another when only one is in the analysis, consistent taxon placement between treatments would indicate long branch attraction is not a problem.


Example

Assume for simplicity that we are considering a single binary character (it can either be + or –) distributed on the unrooted "true tree" with branch lengths proportional to amount of character state change, shown in the figure. Because the evolutionary distance from B to D is small, we assume that in the vast majority of all cases, B and D will exhibit the same character state. Here, we will assume that they are both + (+ and – are assigned arbitrarily and swapping them is only a matter of definition). If this is the case, there are four remaining possibilities. A and C can both be +, in which case all taxa are the same and all the trees have the same length. A can be + and C can be –, in which case only one character is different, and we cannot learn anything, as all trees have the same length. Similarly, A can be – and C can be +. The only remaining possibility is that A and C are both –. In this case, however, we view either A and C, or B and D, as a group with respect to the other (one character state is ancestral, the other is derived, and the ancestral state does not define a group). As a consequence, when we have a "true tree" of this type, the more data we collect (i.e. the more characters we study), the more of them are homoplastic and support the wrong tree. Of course, when dealing with empirical data in phylogenetic studies of actual organisms, we never know the topology of the true tree, and the more parsimonious (AC) or (BD) might well be the correct hypothesis.


Long branch repulsion

While likelihood-based estimates are relatively more resistant to long branch attraction, they may fail in the opposite way: when two closely related taxa have long branches, they may be incorrectly separated. This is long branch repulsion (LBR).


Avoidance

Non-parismony methods such as
Bayesian inference Bayesian inference ( or ) is a method of statistical inference in which Bayes' theorem is used to calculate a probability of a hypothesis, given prior evidence, and update it as more information becomes available. Fundamentally, Bayesian infer ...
and
Maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
tends to reduce the occurrence of LBA, but does not eliminate it fully. (Bayesian is more prone to LBA in this regard.) Specifically, they still struggle with cases of compositional heterogeneity among taxa and sites, which invalidate the assumption of basic substitution models. This can be avoided by using a mixture model or PMSF which takes into account these possibilities. Amino-acid recoding and data filtering with compositional tests can also help. Excluding problematic portions of the data such as fast-evolving sites can help. Exclusion of certain taxa from analysis, either the long-branching ones themselves, or some regular taxa, also occasionally helps, though adding taxa tends to help in more cases. Adding data from taxa related to the long-branchubg taxon can break up the branch into smaller, more manageable pieces. Many more methods are useful in detecting LBA. Worked examples of detection and avoidance can be found in Bergsten (2005).


Evaluation of methods

The resistance of a method to LBA and LBR is empirically tested using challenging real or simulated data. With real data one is not totally sure of the ground truth, but they are guaranteed to be naturalistic. With simulated data one can specify a "true" shape of the tree and a model of evolution (hopefully one that resembles natural evolution). Some real data known to be challenging include: * Leebens-Mack et al. 2005
angiosperm Flowering plants are plants that bear flowers and fruits, and form the clade Angiospermae (). The term angiosperm is derived from the Greek words (; 'container, vessel') and (; 'seed'), meaning that the seeds are enclosed within a fruit ...
data set, where protein and nucleotide produced different results in their analysis * Brinkmann et al. 2005 dataset containing slow-evolving eukaryotes, archaea, and one fast-evolving
microsporidia Microsporidia are a group of spore-forming unicellular parasites. These spores contain an extrusion apparatus that has a coiled polar tube ending in an anchoring disc at the apical part of the spore.Franzen, C. (2005). How do Microsporidia inva ...
n * The "nematode" and "platyhelminth" datasets in Lartillot et al. 2007 * The Brown et al. 2013 dataset, which may or may not recover an " Obazoa". On the simulation side, a classic program is Seq-gen. The page for Pro-cov lists a number of later variants of Seq-gen to represent different kinds of heterotachy.


References

* Felsenstein, J. (2004): ''Inferring Phylogenies''. Sinauer Associates, Sunderland, MA. * Hennig, W. (1966): ''Phylogenetic Systematics''. University of Illinois Press, Urbana, IL. * Schuh, R. T. and Brower, A. V. Z. (2009): ''Biological Systematics: Principles and Applications, (2nd edn.)'' Cornell University Press, Ithaca, NY. *Grishin, Nick V. "Long Branch Attraction." Long Branch Attraction. Butterflies of America, 17 Aug. 2009. Web. 15 Sept. 2014. . {{Phylogenetics Phylogenetics