
In
phylogenetics
In biology, phylogenetics (; from Greek φυλή/ φῦλον [] "tribe, clan, race", and wikt:γενετικός, γενετικός [] "origin, source, birth") is the study of the evolutionary history and relationships among or within groups ...
, reconciliation is an approach to connect the history of two or more
coevolving biological entities. The general idea of reconciliation is that a
phylogenetic tree representing the evolution of an entity (e.g.
homologous
Homology may refer to:
Sciences
Biology
*Homology (biology), any characteristic of biological organisms that is derived from a common ancestor
*Sequence homology, biological homology between DNA, RNA, or protein sequences
* Homologous chrom ...
gene
In biology, the word gene (from , ; "... Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...
s or
symbionts
Symbiosis (from Greek , , "living together", from , , "together", and , bíōsis, "living") is any type of a close and long-term biological interaction between two different biological organisms, be it mutualistic, commensalistic, or parasit ...
) can be drawn within another phylogenetic tree representing an encompassing entity (respectively, species, hosts) to reveal their interdependence and the evolutionary events that have marked their shared history. The development of reconciliation approaches started in the 1980s, mainly to depict the coevolution of a gene and a genome, and of a host and a symbiont, which can be mutualist, commensalist or parasitic. It has also been used for example to
detect horizontal gene transfer, or understand the dynamics of
genome evolution
Genome evolution is the process by which a genome changes in structure (sequence) or size over time. The study of genome evolution involves multiple fields such as structural analysis of the genome, the study of genomic parasites, gene and ancient ...
.
Phylogenetic reconciliation can account for a diversity of evolutionary trajectories of what makes life's history, intertwined with each other at all scales that can be considered, from molecules to populations or cultures. A recent avatar of the importance of interactions between levels of organization is the
holobiont
A holobiont is an assemblage of a host and the many other species living in or around it, which together form a discrete ecological unit through symbiosis, though there is controversy over this discreteness. The components of a holobiont are ...
concept, where a macro-organism is seen as a complex partnership of diverse species. Modeling the evolution of such complex entities is one of the challenging and exciting direction of current research on reconciliation.
Phylogenetic trees as nested structures

Phylogenetic trees are intertwined at all levels of organization, integrating conflicts and dependencies within and between levels. Macro-organism populations migrate between continents, their
microbe
A microorganism, or microbe,, ''mikros'', "small") and ''organism'' from the el, ὀργανισμός, ''organismós'', "organism"). It is usually written as a single word but is sometimes hyphenated (''micro-organism''), especially in olde ...
symbiont
Symbiosis (from Greek , , "living together", from , , "together", and , bíōsis, "living") is any type of a close and long-term biological interaction between two different biological organisms, be it mutualistic, commensalistic, or paras ...
s switch between populations, the genes of their symbionts transfer between microbe species, and domains are exchanged between genes.
This list of organization levels is not representative or exhaustive, but gives a view of levels where reconciliation methods have been used.
As a generic method, reconciliation could take into account numerous other levels. For instance, it could consider the
syntenic
In genetics, the term synteny refers to two related concepts:
* In classical genetics, ''synteny'' describes the physical co-localization of genetic loci on the same chromosome within an individual or species.
* In current biology, ''synteny'' mor ...
organization of genes,
the interacting history of transposable elements and species,
the evolution of a protein complex across species.
The scale of evolutionary events considered can go from population events such as geographical diversification to nucleotids levels one inside genes,
[Felsenstein J (2004) Inferring Phylogenies. Oxford University Press] including for instance chromosome levels events inside genomes such as whole genome duplication.
Phylogenies have been used for representing the diversification of life at many
levels of organization
Biological organisation is the hierarchy of complex biological structures and systems that define life using a reductionistic approach. The traditional hierarchy, as detailed below, extends from atoms to biospheres. The higher levels of this sc ...
: macro-organisms,
their
cells
Cell most often refers to:
* Cell (biology), the functional basic unit of life
Cell may also refer to:
Locations
* Monastic cell, a small room, hut, or cave in which a religious recluse lives, alternatively the small precursor of a monastery w ...
throughout
development
Development or developing may refer to:
Arts
*Development hell, when a project is stuck in development
*Filmmaking, development phase, including finance and budgeting
*Development (music), the process thematic material is reshaped
*Photographi ...
,
micro-organisms
A microorganism, or microbe,, ''mikros'', "small") and ''organism'' from the el, ὀργανισμός, ''organismós'', "organism"). It is usually written as a single word but is sometimes hyphenated (''micro-organism''), especially in olde ...
through
marker gene In biology, a marker gene may have several meanings. In nuclear biology and molecular biology, a marker gene is a gene used to determine if a nucleic acid sequence has been successfully inserted into an organism's DNA. In particular, there are tw ...
s,
chromosome
A chromosome is a long DNA molecule with part or all of the genetic material of an organism. In most chromosomes the very long thin DNA fibers are coated with packaging proteins; in eukaryotic cells the most important of these proteins ar ...
s,
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respon ...
s,
protein domain
In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist o ...
s,
[ and can also be helpful to understand the evolution of human culture elements such as ]language
Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of ...
s or fairy tales.
At each of these levels, phylogenetic trees describe different stories made of specific diversification events, which may or may not be shared among levels. Yet because they are structurally nested (similar to matryoshka doll
Matryoshka dolls ( ; rus, матрёшка, p=mɐˈtrʲɵʂkə, a=Ru-матрёшка.ogg), also known as stacking dolls, nesting dolls, Russian tea dolls, or Russian dolls, are a set of wooden dolls of decreasing size placed one inside ano ...
s) or functionally dependent, the evolution at a particular level is bound to those at other levels.
Phylogenetic reconciliation is the identification of the links between levels through the comparison of at least two associated trees. Originally developed for two trees, reconciliations for more than two levels have been recently constructed (see section Explicit modeling of three or more levels
Explicit refers to something that is specific, clear, or detailed. It can also mean:
* Explicit knowledge
Explicit knowledge (also expressive knowledge) is knowledge that can be readily articulated, codified, stored and accessed. It can be express ...
). As such, reconciliation provides evolutionary scenarios that reveal conflict and cooperation among evolving entities. These links may be unintuitive, for instance, genes present in the same genome may show uncorrelated evolutionary histories while some genes present in the genome of a symbiont may show a strong coevolution signal with the host phylogeny. Hence, reconciliation can be a useful tool to understand the constraints and evolutionary strategies underlying the assemblage that forms a holobiont
A holobiont is an assemblage of a host and the many other species living in or around it, which together form a discrete ecological unit through symbiosis, though there is controversy over this discreteness. The components of a holobiont are ...
.
Because all levels essentially deal with the same object, a phylogenetic tree, the same models of reconciliation—in particular those based on duplication-transfer
Transfer may refer to:
Arts and media
* ''Transfer'' (2010 film), a German science-fiction movie directed by Damir Lukacevic and starring Zana Marjanović
* ''Transfer'' (1966 film), a short film
* ''Transfer'' (journal), in management studies
...
-loss
Loss may refer to:
Arts, entertainment, and media Music
* ''Loss'' (Bass Communion album) (2006)
* ''Loss'' (Mull Historical Society album) (2001)
*"Loss", a song by God Is an Astronaut from their self-titled album (2008)
* Losses "(Lil Tjay son ...
events, which are central to this article—can be transposed, with slight modifications, to any pair of connected levels: an "inner", "lower", or "associate" entity (e.g. gene, symbiont species, population) evolves inside an "upper", or "host" one (respectively species, host, or geographical area).
The upper and lower entities are partially bound to the same history, leading to similarities in their phylogenetic trees, but the associations can change over time, become more or less strict or switch to other partners.
History
The principle of phylogenetic reconciliation was introduced in 1979 to account for differences between genes and species-level phylogenies. In a parsimonious setting, two evolutionary events, gene duplication and gene loss were invoked to explain the discrepancies between a gene tree and a species tree. It also described a score on gene trees knowing the species tree and an aligned sequence by using the number of gene duplication, loss, and nucleotide replacement for the evolution of the aligned sequence, an approach still central today with new models of reconciliation and phylogeny inference.
The term ''reconciliation'' has been used by Wayne Maddison
Wayne Paul Maddison , is a professor and Canada Research Chair at the departments of zoology and botany at the University of British Columbia, and the Director of the Spencer Entomological Collection at the Beaty Biodiversity Museum.
His researc ...
in 1997,[Wayne P. Maddison (1997) Gene Trees in Species Trees, Systematic Biology, 46(3) 523-536.] as a reverse concept of "phylogenetic discord" resulting from gene level evolutionary events.
Reconciliation was then developed jointly for the coevolution
In biology, coevolution occurs when two or more species reciprocally affect each other's evolution through the process of natural selection. The term sometimes is used for two traits in the same species affecting each other's evolution, as well ...
of host and symbiont and the geographic diversification of species. In both settings, it was important to model a horizontal event that implied parallel branches of the host tree: host switch
In parasitology and epidemiology, a host switch (or host shift) is an evolutionary change of the host specificity of a parasite or pathogen. For example, the human immunodeficiency virus used to infect and circulate in non-human primates in West- ...
for host and symbiont and species dispersion from one area to another in biogeography. Unlike for genes and genomes, the coevolution of host and symbiont and the explanation of species diversification by geography are not always the null hypothesis
In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
. A visual depiction of the two phylogenies in a tanglegram can help assess such coevolution, although it has no statistical obvious interpretation.
Character methods, such as Brooks Parsimony Analysis, were proposed to test coevolution and reconstruct scenarios of coevolution. In these methods, one of the trees is forgotten except for its leaves, which are then used as a character evolving on the second tree.
First models for reconciliation, taking explicitly into account the two topologies and using a mechanistic event-based approach, were proposed for host and symbiont and biogeography. Debates followed, as the methods were not yet completely sound but integrated useful information in a new framework.
Costs for each event and a dynamic programming technique considering all pairs of host and symbiont nodes were then introduced into a host and symbiont approach, both of which still underlie most of the current reconciliation methods for host and symbiont as well as for species and genes.
Reconciliation returned to the framework it was introduced in, gene and species. After character models were considered for horizontal gene transfer, a new reconciliation model, following and improving the dynamic programming approach presented for host and symbiont, effectively introduced horizontal gene transfer to gene and species reconciliation on top of the duplication and loss model.
The progressive development of phylogenetic reconciliation was thus possible through exchanges between multiple research communities studying phylogenies at the host and symbiont, gene and species, or biogeography levels. This story and its modern developments have been reviewed several times, generally focusing on specific pairs of levels, with a few exceptions.[ New developments start to bring the different frameworks together with new integrative models.
]
Pocket Gophers and their chewing lices: a classical example
Pocket gophers ( Geomyidae) and their chewing lice (Trichodectidae
Trichodectidae is a family of louse in the suborder Ischnocera. Its species are parasites of mammals. The following 19 genera are recognized:
* ''Bovicola''
* '' Cebidicola''
* '' Damalinia''
* '' Dasyonyx''
* '' Eurytrichodectes''
* '' ...
) form a well studied system of host and symbiont coevolution. The phylogeny of host and symbiont and the matching of the leaves of their trees are depicted on the left. For the host, O. stands for ''Orthogeomys
The giant pocket gopher (''Orthogeomys grandis'') is a species of rodent in the family Geomyidae. It is found in Mexico, Guatemala, El Salvador and Honduras. It is the type species of the genus ''Orthogeomys''; some zoologists also include in ...
'', G. for ''Geomys
The genus ''Geomys'' contains 12 species of pocket gophersSearch results for "''Geomys''" on thASM Mammal Diversity Database often collectively referred to as the eastern pocket gophers. Like all pocket gophers, members of this genus are fossor ...
'' and T. for '' Thomomys''; for the symbiont, G. stands for ''Geomydoecus
''Geomydoecus'' is a genus of louse in the suborder Ischnocera
The Ischnocera is a large superfamily of lice. They are mostly parasitic on birds, but including a large family (the Trichodectidae) parasitic on mammals. The genus '' Trichophil ...
'' and T. for '' Thomoydoecus''.
Reconciling the two trees means giving a scenario with evolutionary events and matching on the ancestral nodes depicting the coevolution of the two trees. The events considered in this system are the events of the DTL model: duplication, transfer (or host switch), loss, and cospeciation, the null event of coevolution.
Two scenarios were proposed in two studies, using two different frameworks which could be deemed as pre-dynamic programming DTL reconciliation.
In modern DTL reconciliation frameworks, costs are assigned to events. The two scenarios were then shown to correspond to maximum parsimonious reconciliation with different cost assignments.[
The scenario A uses 6 cospeciations, 2 duplications, 3 losses and 2 host switches to reconcile the two trees, while scenario B uses 5 cospeciations, 3 duplications, 3 losses and 2 host switches. The cost of a scenario is the sum of the cost of its events. For instance, with a cost of 0 for cospeciation, 2 for duplication, 1 for loss and 3 for host switch, scenario A has a cost of and scenario B of , and so according to a parsimonious principle, scenario A would be deemed more likely (scenario A stays more likely as long as the cost of cospeciation is less than the cost of duplication).
]
Development of Phylogenetic Reconciliation Models
Models and methods used today in phylogeny are the result of several decades of research, made progressively complex, driven by the nature of the data and the quest for biological realism on one side, and the limits and progresses of mathematical and algorithmic methods on the other.
Pre-reconciliation models: characters on trees
Character methods can be used when there is no tree available for one of the levels, but only values for a character at the leaves of a phylogenetic tree for the other level. A model defines the events of character value change, their rate, probabilities or costs. For instance, the character can be the presence of a host on a symbiont tree,[ the geographical region on a species tree,] the number of genes on a genome tree, or nucleotides in a sequence.[
Such methods thus aim at reconstructing ancestral characters at internal nodes of the tree.]
Although these methods have produced results on genome evolution, the utility of a second tree appears with very simple examples. If a symbiont has recently acquired the ability to spread in a group of species and thus it is present in most of them, character methods will wrongly indicate that the common ancestor of the hosts already had the symbiont. In contrast, a comparison of the symbiont and host trees would show discrepancies revealing horizontal transfers.
The origins of reconciliation: the Duplication Loss model and the Lowest Common Ancestor mapping
Duplication and loss
Loss may refer to:
Arts, entertainment, and media Music
* ''Loss'' (Bass Communion album) (2006)
* ''Loss'' (Mull Historical Society album) (2001)
*"Loss", a song by God Is an Astronaut from their self-titled album (2008)
* Losses "(Lil Tjay son ...
were invoked first to explain the presence of multiple copies of a gene in a genome or its absence in certain species.[ It is possible with those two events to reconcile any two trees,][ i.e. to map the nodes and branches of the lower and upper trees, or equivalently to give a list of evolutionary events explaining the discrepancies between the upper tree and the lower tree.
A most parsimonious Duplication and Loss (DL) reconciliation is computed through the ]Lowest Common Ancestor
In graph theory and computer science, the lowest common ancestor (LCA) (also called least common ancestor) of two nodes and in a tree or directed acyclic graph (DAG) is the lowest (i.e. deepest) node that has both and as descendants, where ...
(LCA) mapping: proceeding from the leaves to the root
In vascular plants, the roots are the organs of a plant that are modified to provide anchorage for the plant and take in water and nutrients into the plant body, which allows plants to grow taller and faster. They are most often below the sur ...
, each internal node
In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be con ...
is mapped to the lowest common ancestor of the mapping of its two children.
A Markovian model for reconciliation
The LCA mapping in the DL model follows a parsimony principle: no event should be invoked if it is not necessary. However the use of this principle is debated,[ and it is commonly admitted that it is more accurate in molecular evolution to fit a probabilistic model as a ]random walk
In mathematics, a random walk is a random process that describes a path that consists of a succession of random steps on some mathematical space.
An elementary example of a random walk is the random walk on the integer number line \mathbb ...
, which does not necessarily produce parsimonious scenarios.
A birth and death Markovian model is such a model that can generate a lower tree "inside" a fixed upper one from root to leaves.
Statistical inference provides a framework to find most likely scenarios, and in that case, a maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed sta ...
reconciliation of two trees is also a parsimonious one. In addition, it is possible with such a framework to sample scenarios, or integrate over several possible scenarios in order to test different hypotheses, for example to explore the space of lower trees. Moreover, probabilistic models can be integrated into larger models, as probabilities simply multiply when assuming independence, for instance combining sequence evolution and DL reconciliation.
Introducing horizontal transfer
Host switch, i.e. inheritance of a symbiont from a kin lineage, is a crucial event in the evolution of parasitic or symbiotic relationships between species. This horizontal transfer also models migration events in biogeography and became of interest for the reconciliation of gene and species trees when it appeared that many discrepancies could not simply be explained by duplication and loss and that horizontal gene transfer
Horizontal gene transfer (HGT) or lateral gene transfer (LGT) is the movement of genetic material between unicellular and/or multicellular organisms other than by the ("vertical") transmission of DNA from parent to offspring ( reproduction). ...
(HGT) was a major evolutionary process in micro-organisms evolution.
This switching, or horizontal transfer, pattern can also model admixture
Admixture may refer to:
* Genetic admixture, the result of interbreeding between two or more previously isolated populations within a species
* Racial admixture, admixture between humans, also referred to as miscegenation
* Hybrid
* Mixture, the ch ...
or introgression
Introgression, also known as introgressive hybridization, in genetics is the transfer of genetic material from one species into the gene pool of another by the repeated backcrossing of an interspecific hybrid with one of its parent species. Intro ...
.
It is considered in character methods, without information from the symbiont phylogeny.
On top of the DL model, horizontal transfer enables new and very different reconciliation scenarios.
The simple yet powerful dynamic programming approach
The LCA reconciliation method yields a unique solution, which has been shown to be optimal for the problem of minimizing the weighted number of events, whatever the relative weights of duplication and loss.[Chauve C and El-Mabrouk N (2009) New Perspectives on Gene Family Evolution: Losses in Reconciliation and a Link with Supertrees. RECOMB 5541:46-58] In contrast, with Duplication, horizontal Transfer and Loss (DTL), there can be several equally parsimonious reconciliations. For instance, a succession of duplications and losses can be replaced by a single transfer. One of the first ideas to define a computational problem and approach a resolution was, in a host/symbiont framework, to maximize the number of co-speciations with a heuristic algorithm.[
Another solution is to give relative costs to the events and find a scenario that minimizes the sum of the costs of its events.][
In the probabilistic model frameworks, the equivalent task consists of assigning rates or probabilities to events and search for maximum likelihood scenarios, or sample scenarios according to their likelihood. All these problems are solved with a ]dynamic programming
Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics.
I ...
approach. This dynamic programming method involves traversing the two trees in a postorder
In computer science, tree traversal (also known as tree search and walking the tree) is a form of graph traversal and refers to the process of visiting (e.g. retrieving, updating, or deleting) each node in a tree data structure, exactly once. ...
.
Proceeding from the leaves and then going up in the two trees, for each couple of internal nodes (one for each tree), the cost of a most parsimonious DTL reconciliation is computed.[
In a parsimony framework, costs of reconciling a lower subtree rooted at with an upper subtree rooted at is initialized for the leaves with their matching:
And then inductively, denoting the children of the children of the costs associated with speciation, duplication, horizontal transfer and loss, respectively (with often fixed to 0),
The costs and , because they do not depend on , can be computed once for all , hence achieving quadratic ]complexity
Complexity characterises the behaviour of a system or model whose components interact in multiple ways and follow local rules, leading to nonlinearity, randomness, collective dynamics, hierarchy, and emergence.
The term is generally used to c ...
to compute for all couples of and . The cost of losses only appears in association with other events because in parsimony, a loss can always be associated with the preceding event in the tree.
The induction behind the use of dynamic programming is based on always progressing in the trees toward the roots. However some combinations of events that can happen consecutively can make this induction ill-defined.
One such combination consists of a transfer followed immediately by a loss in the donor lineage (TL). Restricting the use of this TL event repairs the induction.[Doyon J, Scornavacca C, Gorbunov K, Szöllősi G, Ranwez V et al. (2010) An Efficient Algorithm for Gene/Species Trees Parsimonious Reconciliation with Losses, Duplications and Transfers. RECOMB-CG] With an unlimited use, it is necessary to use or add other known methods to solve systems of equations like fixed point methods, or numerical solving of differential equations.[ In 2016, only two out of seven of the most commonly used parsimony reconciliation programs did handle TL events,] although their consideration can drastically change the result of a reconciliation.
Unlike LCA mapping, DTL reconciliation typically yields several scenarios of minimal cost, in some cases an exponential number. The strength of the dynamic programming approach is that it enables to compute a minimum cost of coevolution of the input upper and lower tree in quadratic time
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...
, and to get a most parsimonious scenario through backtracking
Backtracking is a class of algorithms for finding solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it de ...
.
It can also be transposed to a probabilistic framework to compute the likelihood of coevolution and get a most likely reconciliation, replacing costs with rates, minimums by sums and sums by products.
Moreover, through multiple backtracks, the approach is suitable for enumerating all parsimonious solutions or to sample scenarios, optimal and sub-optimal, according to their likelihood.
Estimation of event costs and rates
Dynamic programming per se is only a partial solution and does not solve several problems raised by reconciliation.
Defining a most parsimonious DTL reconciliation requires assigning costs to the different kinds of events (D, T and L). Different cost assignments can yield different reconciliation scenarios, so there is a need for a way to choose those costs.
There is a diversity of approaches to do so. CoRe-PA explores in a recursive manner the space of cost vectors, searching for a good matching with the event frequencies in reconciliations.
ALE[ uses the same idea in a probabilistic framework to estimate the event rates by maximum likelihood.
Alternatively, COALA] is a preprocess using approximate Bayesian computation
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics that can be used to estimate the posterior distributions of model parameters.
In all model-based statistical inference, the lik ...
with sequential Monte Carlo
Monte Carlo (; ; french: Monte-Carlo , or colloquially ''Monte-Carl'' ; lij, Munte Carlu ; ) is officially an administrative area of the Principality of Monaco, specifically the ward of Monte Carlo/Spélugues, where the Monte Carlo Casino i ...
: simulation and statistic rejection or acceptance of parameters with successive refinement.
In the parsimony framework, it is also possible to divide the space of possible event costs into areas of costs which lead to the same Pareto optimal solution.
Pareto optimal reconciliations are such that no other reconciliation has a strictly inferior cost for one type of event (duplication, transfer or loss), and less or equal for the others.
It is possible as well to rely on external considerations in order to choose the event costs. For example, the software Angst chooses the costs that minimize the variation of genome size, in number of genes, between parent and children species.
The problem of temporal feasibility
The dynamic programming method works for dated (internal nodes are totally ordered) or undated upper trees. However, with undated trees, there is a temporal feasibility issue. Indeed, a horizontal transfer implies that the donor and the receiver are contemporaneous, therefore implying a time constraint on the tree. In consequence, two horizontal transfers may be incompatible, because they imply contradicting time constraints.
The dynamic programming approach can not easily check for such incompatibilities. If the upper tree is undated, finding a temporally feasible most parsimonious reconciliation is NP-hard
In computational complexity theory, NP-hardness ( non-deterministic polynomial-time hardness) is the defining property of a class of problems that are informally "at least as hard as the hardest problems in NP". A simple example of an NP-hard pr ...
. It is fixed parameter tractable, which means that there are algorithms running in time bounded by an exponential of the number of transfers in the output scenarios.[
Some solutions imply ]integer linear programming
An integer programming problem is a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers. In many settings the term refers to integer linear programming (ILP), in which the objective ...
or branch and bound
Branch and bound (BB, B&B, or BnB) is an algorithm design paradigm for discrete and combinatorial optimization problems, as well as mathematical optimization. A branch-and-bound algorithm consists of a systematic enumeration of candidate solutio ...
exploration.[
If the upper tree is dated, then there is no incompatibility issue because horizontal transfers can be constrained to never go backward in time. Finding a coherent optimal reconciliation is then solved in ]polynomial time
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
[ or with a speed-up in RASCAL,] by testing only a fraction of node mappings.
Most of the software taking undated trees does not look for temporal feasibility, except Jane, which explores the space of total orders via a genetic algorithm, or, in a post process, Notung, and Eucalypt, which searches inside the set of optimal solutions for time consistent ones.
Other methods work as supplementary layers to reconciliations, correcting reconciliations or returning a subset of feasible transfers, which can be used to date a species tree.
Expanding phylogenies: Transfers from the dead
In phylogenetics in general, it is important to keep in mind that the extant and ancestral species that are represented in any phylogeny are only a sparse sample of the species that currently exist or ever have existed. This is why one can safely assume that all transfers that can be detected using phylogenetic methods have originated in lineages that are, strictly speaking, absent from a studied phylogeny. Accounting for extinct or unsampled biodiversity in phylogenetic studies can give a better understanding of these processes. Originally, DTL reconciliation methods did not recognize this phenomenon and only allowed for transfer between contemporaneous branches of the tree, hence ignoring most plausible solutions. However, methods working on undated upper trees can be seen as implicitly handling the unknown diversity by allowing transfers "to the future" from the point of view of one phylogeny, that is, the donor is more ancient than the recipient. A transfer to the future can be translated into a speciation to unknown species, followed by a transfer from unknown species.
ALE[ in its dated version explicitly takes the unknown diversity into account by adding a ]Moran process
A Moran process or Moran model is a simple stochastic process used in biology to describe finite populations. The process is named after Patrick Moran, who first proposed the model in 1958. It can be used to model variety-increasing processes su ...
of speciation/extinctions of species to the dated birth/death model of gene evolution.
Transfers from the dead are also handled in a parsimonious setting by Tera and ecceTERA,[ showing that considering these transfers improves the capacity to reconstruct gene trees using reconciliation, and with a more explicit model] and in a probabilistic setting, in ALE undated.
The specificity of biogeography: a tree like structure for the "evolution" of areas
In biogeography, some applications of reconciliation approaches consider as an upper tree an area cladogram
A cladogram (from Greek ''clados'' "branch" and ''gramma'' "character") is a diagram used in cladistics to show relations among organisms. A cladogram is not, however, an evolutionary tree because it does not show how ancestors are related to ...
with defined ancestral nodes. For instance, the root can be Pangaea
Pangaea or Pangea () was a supercontinent that existed during the late Paleozoic and early Mesozoic eras. It assembled from the earlier continental units of Gondwana, Euramerica and Siberia during the Carboniferous approximately 335 millio ...
and the nodes contemporary continents
A continent is any of several large landmasses. Generally identified by convention rather than any strict criteria, up to seven geographical regions are commonly regarded as continents. Ordered from largest in area to smallest, these seven ...
. Sometimes, internal nodes are not ancestral areas but the unions of the areas of their children, to account for the possibility of species evolving along the lower tree to inhabit one or several areas. In this case, the evolutionary events are migration, where one species colonizes a new area, allopatric speciation
Allopatric speciation () – also referred to as geographic speciation, vicariant speciation, or its earlier name the dumbbell model – is a mode of speciation that occurs when biological populations become geographically isolated from ...
, or vicariance, equivalent to co-speciation
Parasitism is a close relationship between species, where one organism, the parasite, lives on or inside another organism, the host, causing it some harm, and is adapted structurally to this way of life. The entomologist E. O. Wilson has ...
in host/symbiont comparisons.
Even though this approach does not always give a tree (if the unions AB and BC of leaves A, B, C exist, a child can have several parents), and this structure is not associated with time (it is possible for a species to go from A to AB by migration, as well as from AB to A by extinction), reconciliation methods—with events and dynamic programming—can infer evolutionary scenarios between the upper geographical structure and the lower species tree. Diva[Ronquist F (1997) Phylogenetic approaches in coevolution and biogeography. Zoologica Scripta 26:313--322] and Lagrange are two reconciliation models constructing such a tree-like structure and then applying reconciliation, the first with a parsimony principle, the second in a probabilistic framework. Additionally, BioGeoBEARS is a biogeography inference package that reimplemented DIVA and Lagrange models and allows for new options, like distant dependent transfers and discussion on statistical model selection.[
]
Graphical output
With two trees and multiple evolutionary events linking them to represent, viewing reconciled trees is a challenging but necessary question in order to make reconciliation studies more accessible. Some reconciliation softwares include annotation of the evolutionary events on the lower trees,[ while others,][ and specific packages, in DL] or DTL, trace the lower tree embedded in the upper one. One difficulty in this regard is the variety of output formats for the different reconciliation softwares. A common standard, recphyloxml, has been established and endorsed by part of the community, and a viewer is available, able to display reconciliation in multi level systems.
Addressing Additional Practical Considerations
Applying DTL reconciliation to biological data raises several problems related to uncertainty and confidence levels of input and output. Concerning the output, the uncertainty of the answer calls for an exploration of the whole solution space. Concerning the input, phylogenetic reconciliation has to handle uncertainties in the resolution or rooting of the upper or lower trees, or even to propose roots or resolutions according to their confidence.
Exploring the space of reconciliations
Dynamic programming makes it possible to sample reconciliations, uniformly among optimal ones or according to their likelihood.
It is also possible to enumerate them in time proportional to the number of solutions,[ a number which can quickly become intractable (even only for optimal ones). Finding and presenting structure among the multitude of possible reconciliations has been at the center of recent methodological developments, especially for host and symbiont aimed methods. Several works have focused on representing a set of reconciliations in a compact way, from a uniform sample of optimal ones][ or by constructing a graph summarizing the optimal solutions.] This can be achieved by giving support values to specific events based on all optimal (or suboptimal) reconciliations, or with the use of a consensus reconciled tree.[ In a DL model, it is possible to define a median reconciliation, based on shared events and to compute it in polynomial time.][Huber K, Moulton V, Sagot M-F, Sinaimeri B (2018) Geometric medians in reconciliation spaces of phylogenetic trees. Information Processing Letter. 136: 96–101]
EMPRess[ can group similar reconciliations through clustering,] with all pairwise distance between reconciliations computable in polynomial time (independently of the number of most parsimonious reconciliations). With the same aim, Capybara defines equivalence classes among reconciliations, efficiently computing representatives for all classes, and outputs with linear delay a given number of reconciliations (first optimal ones, then sub optimal).
The space of most parsimonious reconciliation can be expanded or reduced when increasing or decreasing horizontal transfer allowed distance,[ which is easily done by dynamic programming.
]
Inferring phylogenetic trees with reconciliation
Reconciliation and input uncertainty
Reconciliation works with two fixed trees, a lower and an upper, both assumed correct and rooted. However, those trees are not first hand data.
The most frequently used data for phylogenetics consists in aligned nucleotidic or proteic sequences.
Extracting DNA, sequencing, assembling and annotating genomes, recognizing homology relationships among genes and producing multiple alignments for phylogenetic reconstruction are all complex processes where errors can ultimately affect the reconstructed tree.
Any topology or rooting error can be misinterpreted and cause systematic bias. For instance, in DL reconciliations, errors on the lower tree bias the reconciliation toward more duplication events closer to the root and more losses closer to the leaves.
On the other hand, reconciliation, as a macro evolutionary model, can work as a supplementary layer to the micro evolutionary model of sequence evolution, resolving polytomies (nodes with more than two children) or rooting trees, or be intertwined with it through integrative models in order to get better phylogenies.
Most of the works in this direction focus on gene/species reconciliations, nevertheless some first steps have been made in host/symbiont, such as considering unrooted symbiont trees or dealing with polytomies in Jane.[
]
Exploring the space of lower trees with reconciliation
Reconciliation can easily take unrooted lower trees as input, which is a frequently used feature because trees inferred from molecular data are typically unrooted. It is possible to test all possible roots, or a thoughtful triple traversal of the unrooted tree allows to do it without additional time complexity.[ In a duplication-loss model, the set of roots minimizing the costs are found close to one another, forming a "plateau",] a property which does not generalize to DTL.[
]
Reconciliation can also take as input non binary trees, that is, with internal nodes with more than two children. Such trees can be obtained for example by contracting branches with low statistical support. Inferring a binary tree from a non binary tree according to reconciliation scores is solved in DL with efficient methods.[Lafond M and Noutahi E and El-Mabrouk N (2016) Efficient Non-Binary Gene Tree Resolution with Weighted Reconciliation Cost. 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016) 14:1--14:12] In DTL, the problem is NP hard
In computational complexity theory, NP-hardness ( non-deterministic polynomial-time hardness) is the defining property of a class of problems that are informally "at least as hard as the hardest problems in NP". A simple example of an NP-hard pr ...
. Heuristics and exact fixed parameter tractable algorithms are possible solutions.
Another way to handle uncertainty in lower trees is to take as input a sample of alternative lower trees instead of a single one. For example, in the paper that gave reconciliation its name,[ it was proposed to consider all most likely lower trees, and choose from these trees the best one according to their DL costs, a principle also used by TreeFix-DTL.]
The sample of lower trees can similarly reflect their likelihood according to the aligned sequences, as obtained from Bayesian Markov chain Monte Carlo
In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
methods as implemented for example in Phylobayes. AngST,[ ALE][ and ecceTERA][ use "amalgamation", an extension of the DTL dynamic programming that is able to efficiently traverse a set of alternative lower trees instead of a single tree.
A local search in the space of lower trees guided by a joint likelihood, on the one hand from multiple sequence alignments and on the other hand from reconciliation with the upper tree, is achieved in Phyldog with a DL model] and in GeneRax with DTL.[ In a DL model with sequence evolution and relaxed ]molecular clock
The molecular clock is a figurative term for a technique that uses the mutation rate of biomolecules to deduce the time in prehistory when two or more life forms diverged. The biomolecular data used for such calculations are usually nucleo ...
, the lower tree space can be explored with an MCMC. MowgliNNI can modify the input gene tree at poorly supported nodes to increase DTL score, while TreeSolve resolves the multifurcations added by collapsing poorly supported nodes.
Finally, integrative models—mixing sequence evolution and reconciliation—can compute a joint likelihood via dynamic programming (for both reconciliation and gene sequences evolution),[ use Markov chain Monte Carlo to include molecular clock to estimate branch lengths, in a DL model][ or with a relaxed molecular clock,][ and in a DTL model.] These models have been applied in gene/species frameworks, not yet in host/symbiont or biogeography contexts.
Inferring upper trees using reconciliation
Inferring an upper tree from a set of lower trees is a long standing question related to the supertree A supertree is a single phylogenetic tree assembled from a combination of smaller phylogenetic trees, which may have been assembled using different datasets (e.g. morphological and molecular) or a different selection of taxa. Supertree algorithms c ...
problem. It is particularly interesting in the case of gene/species reconciliation where many (typically thousands of) gene trees are available from complete genome sequences. Supertree methods attempt to assemble a species tree based on sets of trees which may differ in terms of contemporary species sets and topology, but usually without consideration for the biological process explaining these differences. However, some supertree approaches are statistically consistent for the reconstruction of the species tree if the gene trees are simulated under a DL model. This means that if the number of input lower trees generated from the true upper tree via the DL model grows toward infinity, given that there are no additional errors, the output upper tree converges almost surely to the true one. This has been shown in the case of a quartet distance The quartet distance is a way of measuring the distance between two phylogenetic trees. It is defined as the number of subsets of four leaves that are not related by the same topology in both trees.
Computing the quartet distance
The most straigh ...
, and with a generalised Robinson Foulds multicopy distance, with better running time
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...
but assuming gene trees do not contain bipartitions contradicting the species tree, which seems rare under a DL model.
Reconciliation can also be used for the inference of upper trees. This is a computationally hard problem: already resolving polytomies in a non binary upper tree with a binary lower one—minimizing a DL reconciliation score—is NP-hard.[ In particular, reconstructing the species tree giving the best DL cost for several gene trees is NP-hard and 2-approximable.][Ma B and Li M and Zhang L (2000) From Gene Trees to Species Trees. SIAM J. Comput., may, 729–752 24] It is called the Gene Duplication problem or more generally Gene Tree parsimony. The problem was seen as a way to detect paralogy
Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a spe ...
to get better species tree reconstruction. It is NP-hard, with interesting results on the problem complexity and the behaviour of the model with different input size, structure and ILS presence. Multiple solutions exists, with ILP or heuristics, and with the possibility of a deep coalescence score.
ODTL[ takes as input gene trees and searches a maximum likelihood species tree according to a DTL model, with a ]hill-climbing
numerical analysis, hill climbing is a mathematical optimization technique which belongs to the family of local search. It is an iterative algorithm that starts with an arbitrary solution to a problem, then attempts to find a better solution ...
search. The approach produces a species tree with internal nodes ordered in time, ensuring a time compatibility for the scenarios of transfer among lower trees .
Addressing a more general problem, Phyldog[ searches for the maximum likelihood species tree, gene trees and DL parameters from multiple family alignments via multiple rounds of local search. It thus performs the exploration of both upper and lower trees at the same time. MixTreEM] presents a faster solution.
Limits of the two-level DTL model
A limit to dynamic programming: non independent evolution of children lineages
The dynamic programming framework, like usual birth and death models, works under the hypothesis of independent evolution of children lineages in the lower tree. However, this hypothesis does not hold if the model is complemented with several other documented evolutionary events, such as horizontal transfer with replacement of a homologous gene in the recipient lineage, or gene conversion
Gene conversion is the process by which one DNA sequence replaces a homologous sequence such that the sequences become identical after the conversion event. Gene conversion can be either allelic, meaning that one allele of the same gene replaces ...
. Horizontal transfer with replacement is usually modeled by a rearrangement of the upper tree, called Subtree Prune and Regraft (SPR). Reconciling under SPR is NP-hard, even in dated trees, and fixed-parameter tractable
In computer science, parameterized complexity is a branch of computational complexity theory that focuses on classifying computational problems according to their inherent difficulty with respect to ''multiple'' parameters of the input or output. ...
regarding the output size.[Bordewich W and Semple C (2005) On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance. Annals of Combinatoris 8: 409-423]
Another way to model and infer replacing horizontal transfers is through maximum agreement forest, where branches are cut in the lower and upper trees in order to get two identical (or statistically indistinguishable) upper and lower forests. The problem is NP-hard,[Hein J, Jiang T, Wang L, Zhang K (1996) On the complexity of comparing evolutionary trees. Discrete Applied Mathematics 71:153--169] but several approximations have been proposed.
Replacing transfers can be considered on top of the DL model.[Kordi M (2019) Inferring Microbial Gene Family Evolution Using Duplication-Transfer-Loss Reconciliation: Algorithms and Complexity. Doctoral Dissertations. 2101.] In the same vein, gene conversion can be seen as a "replacing duplication". In this latter case, a polynomial algorithm which does not use dynamic programming and is an extension of the LCA method can find all optimal solutions, including gene conversions.[
]
Integrating population levels: failure to diverge and Incomplete Lineage Sorting
In host/symbiont frameworks, a single symbiont species is sometimes associated to several host species. This means that while a speciation or diversification has been observed in the host, the populations are indistinguishable in the symbiont.
This is handled for example by additional polytomies in the symbiont tree, possibly leading to intractable
Intractable may refer to:
* Intractable conflict, a form of complex, severe, and enduring conflict
* Intractable pain, pain which cannot be controlled/cured by any known treatment
* Intractable problem
In theoretical computer science and mathema ...
inference problems, because polytomies need to be resolved.
It is also modeled by an additional evolutionary event "failure to diverge" (Jane,[ Amocoala][Urbini L (2017) Models and algorithms to study the common evolutionary history of hosts and symbionts. Doctoral thesis, Université de Lyon]).
Failure to diverge can be a way to allow "free" host switch
In parasitology and epidemiology, a host switch (or host shift) is an evolutionary change of the host specificity of a parasite or pathogen. For example, the human immunodeficiency virus used to infect and circulate in non-human primates in West- ...
in a population, a flow of symbionts between closely related hosts.
Following that vision, host switch allowed only for close hosts is considered in Eucalypt.[ This idea of horizontal flow between close populations can also be applied to gene/species frameworks, with a definition of species based on a gradient of gene flow between populations.]
Failure to diverge is one way of introducing population dynamics
Population dynamics is the type of mathematics used to model and study the size and age composition of populations as dynamical systems.
History
Population dynamics has traditionally been the dominant branch of mathematical biology, which has ...
in reconciliation, a framework mainly adapted to the multi-species level, where populations are supposed to be well differentiated. There are other population phenomena that limit this framework, one of them being deep coalescence of lineages, leading to Incomplete Lineage Sorting
Incomplete lineage sorting, also termed hemiplasy, deep coalescence, retention of ancestral polymorphism, or trans-species polymorphism, describes a phenomenon in population genetics when ancestral gene copies fail to coalesce (looking backwards i ...
(ILS), which is not handled by the DTL model. The multi species coalescent is a classical model of allele
An allele (, ; ; modern formation from Greek ἄλλος ''állos'', "other") is a variation of the same sequence of nucleotides at the same place on a long DNA molecule, as described in leading textbooks on genetics and evolution.
::"The chro ...
evolution along a species tree, with birth of alleles
and sorting of alleles at speciations, that takes into account population sizes and naturally encompasses ILS.[Rannala B and Edwards S and Leaché A and Yang Z (2020) Phylogenetics in the Genomic Era, 3.3:1--3.3:21]
In a reconciliation context, several attempts have been made in order to account for ILS without the complex integration of a population model. For example, ILS can be seen as a possible evolutionary pattern for the gene tree. In that case, children lineages are not independent of one another, leading to intractability results. ILS alone can be handled with LCA, but ILS + DL reconciliation is NP hard, even without transfers.
Notung[ handles ILS by collapsing short branches of the species tree in polytomies and allowing ILS as a free diversification of gene trees on those polytomies. ecceTERA] binds the maximum size of connected parts of the species tree where ILS can happen, proposing a fixed parameter tractable algorithm in that parameter.
ILS and DL can be considered on an upper network
Network, networking and networked may refer to:
Science and technology
* Network theory, the study of graphs as a representation of relations between discrete objects
* Network science, an academic field that studies complex networks
Mathematics ...
instead of a tree. This models in particular introgression, with the possibility to estimate model parameters.[Du P and Ogilvie H A and Nakhleh L (2019) Unifying Gene Duplication, Loss, and Coalescence on Phylogenetic Networks. Bioinformatics Research and Applications. ISBRA 2019. Lecture Notes in Computer Science, vol 11490. Springer, Cham.]
More integrative reconciliation models accounting for ILS have been proposed, including both DL and multispecies coalescent, with DLCoal. It is a probabilistic model with a parsimony translation, proposing two sequential LCA-type heuristics handled via an intermediate locus tree between gene and species.
However, outside of the gene/species reconciliation framework, ILS seems, for no particular reason, never considered in host/symbiont, nor in biogeography.
Cophylogeny with more than two levels
A striking aspect of reconciliation is the common methodology handling different levels of organization: it is used for comparing domain and protein trees, gene and species trees, hosts and symbiont trees, population and geographic trees. However, now that scientists tend to consider that multi-level models of biological functioning bring a novel and game changing view of organisms and their environment, the question is how to use reconciliation to bring phylogenetics to this holobiont era.
Coevolution of entities at different scales of evolution is at the basis of the holobiont idea: macro-organisms, micro-organisms and their genes all have a different history bound to a common functioning in a single ecosystem. Biological systems like the entanglement of host, symbionts and their genes imply functional and evolutionary dependencies between more than two levels.
Examples of multi level systems with complex evolutionary inter-dependencies
Genes coevolving beyond genome boundaries
The holobiont concept stresses the possibility of genes from different genomes to cooperate and coevolve. For instance, certain genes in a symbiont genome may provide a function to its host, like the production of a vital compound absent from available feeding sources. An iconic example is the case for blood-feeding or sap-feeding insects, which often depend on one or several bacterial symbionts to thrive on a resource that is abundant in sugar, but lacks essential amino-acids or vitamins. Another example is the association of Fabaceae with nitrogen-fixing bacteria
Nitrogen fixation is a chemical process by which molecular nitrogen (), with a strong triple covalent bond, in the air is converted into ammonia () or related nitrogenous compounds, typically in soil or aquatic systems but also in industry. At ...
. The compound beneficiary to the host is typically produced by a set of genes encoded in the symbiont genome, which throughout evolution, may be transferred to other symbionts, and/or in and out of the host genome. Reconciliation methods have the potential to reveal evolutionary links between portions of genomes from different species. A search for coevolving genes beyond the boundaries of the genomes in which they are encoded would highlight the basis for the association of organisms in the holobiont.
Horizontal gene transfer routes depend on multiple levels
In intracellular mutualistic symbiont insect systems, multiple occurrences of horizontal gene transfers have been identified, whether from host to symbiont, symbiont to host or symbiont to symbiont.
Transfers of endosymbiont genes involved in nutrition pathways beneficiary to the insect host have been shown to occur preferentially if the donor and recipient lineages share the same host. This is also the case in insects with bacterial symbionts providing defensive protein or in obligate leaf nodule bacterial symbionts associated with plants. In the human host, gene transfer has been shown to occur preferentially among symbionts hosted in the same organs.
A review of horizontal gene transfers in host/symbiont systems stresses the importance of supporting HGTs with multiple evidence. Notably it is argued that transfers should be considered better supported when involving symbionts sharing a habitat, a geographical area, or the same host. One should, however, keep in mind that most of the diversity of hosts and symbionts is unknown and that transfers may have occurred in unsampled closely related species, hosts or symbionts.
The idea that gene transfer in symbionts is constrained by the host can also be used to investigate the host's phylogenetic history. For instance, based on phylogeographical studies, it is now accepted that the bacterium ''Helicobacter pylori
''Helicobacter pylori'', previously known as ''Campylobacter pylori'', is a gram-negative, microaerophilic, spiral (helical) bacterium usually found in the stomach. Its helical shape (from which the genus name, helicobacter, derives) is th ...
'' has been associated with human populations since the origins of the human species
Anthropogeny is the study of human origins. It is not simply a synonym for human evolution by natural selection, which is only a part of the processes involved in human origins. Many other factors besides natural selection were involved, ranging o ...
. An analysis of the genomes of ''Helicobacter pylori'' in Europe suggests that they are issued from a recombination between African and Asian ''Helicobacter pylori''. This strongly implies early contacts between the corresponding human populations.
Similarly, an analysis of HGTs in coronaviruses from different mammalian species using reconciliation methods has revealed frequent contact between viral lineages, which can be interpreted as frequent host switches.
Cultural evolution
The evolution of elements of human culture
Culture () is an umbrella term which encompasses the social behavior, institutions, and norms found in human societies, as well as the knowledge, beliefs, arts, laws, customs, capabilities, and habits of the individuals in these groups.Ty ...
, for instance language
Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of ...
s and folktale
A folktale or folk tale is a folklore genre that typically consists of a story passed down from generation to generation orally.
Folktale may also refer to:
Categories of stories
* Folkloric tale from oral tradition
* Fable (written form of the a ...
s, in association with human population genetics
Population genetics is a subfield of genetics that deals with genetic differences within and between populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and pop ...
, has been studied using concepts from phylogenetics. Although reconciliation has never been used in this framework, some of these studies encompass multiple levels of organization, each represented by a tree or the evolution of a character, with a focus on the coevolution of these levels.
Language trees can be compared with population trees in order to reveal vertically transmitted folktales, via a character model on this language tree. Variants in each folktale's family, languages, genetic diversity, populations and geography can be compared two by two, to link folktale diversification with languages on one side and with geography on the other side. As in genetics with symbionts sharing host promoting HGTs, linguistic barriers can foreclose the transmission of folktales or language elements.
Investigating three-level systems using two-level reconciliation
Multi level reconciliation is not as developed as two-level reconciliation. One way to approach the evolutionary dependencies between more than two levels of organization is to try to use available standard two-level methods to give a first insight into a biological system's complexity.
File:Phylogenetic reconciliation segmental events.svg, Multiple gene lineages can undergo joint events like segmental duplication, transfer or loss, or even whole genome duplication.
File:Phylogenetic reconciliation network.svg, Reconciliation can help identify highways of transfers and hybridizations.
File:Phylogenetic reconciliation coupled reconciliations.svg, Three levels can be reconciled together, sequentially: the intermediate in the upper before adding the lower, or trying to find a joint most parsimonious scenario for the two reconciliations.
File:Phylogenetic reconciliation multilevel comparison.svg, With more than two levels, the reconciliation of the lower and intermediate levels can be compared to the reconciliation of the lower and upper.
Multi-gene events: implicit consideration of an intermediate level
At the gene/species tree level, one typically deals with many different gene trees. In this case, the hypothesis that different gene families evolve independently is made implicitly. However, this does not need to be the case. For instance, duplication, transfer and loss can occur for segments of a genome spanning an arbitrary number of contiguous genes. It is possible to consider such multi-gene events using an intermediate guide for lower trees inside the upper one. For instance, one can compute the joint likelihood of multiple gene tree reconciliations with a dated species tree with duplication, loss and whole genome duplication[ or in a parsimonious setting,] and one definition of the problem is NP-hard.[Michael R. Fellows, Michael T. Hallet, and Ulrike Stege. 1998. On the Multiple Gene Duplication Problem. In Proceedings of the 9th International Symposium on Algorithms and Computation (ISAAC '98). Springer-Verlag, Berlin, Heidelberg, 347–356.] Similarly, the DL framework can be enriched with duplication and loss of chromosome segments instead of a single gene. However, DL reconciliation becomes intractable with that new possibility.
The link between two consecutive genes can also be modeled as an evolving character, subject to gain, loss, origination, breakage, duplication and transfer.[ The evolution of this link appears as an additional level to species and gene trees, partly constrained by the gene/species tree reconciliation, partly evolving on its own, according to genome organization. It thus models the ]synteny
In genetics, the term synteny refers to two related concepts:
* In classical genetics, ''synteny'' describes the physical co-localization of genetic loci on the same chromosome within an individual or species.
* In current biology, ''synteny'' mo ...
, or proximity between genes. At another scale, it can as well model the evolution of two domains belonging to a protein.
The detection of "highways of transfers", the preferential acquisition of groups of genes from a specific donor, is another example of non-independence of gene histories. Similarly, multi-gene transfers can be detected. It has also led to methodological developments such as reconciliations using phylogenetic network
A phylogenetic network is any graph used to visualize evolutionary relationships (either abstractly or explicitly) between nucleotide sequences, genes, chromosomes, genomes, or species. They are employed when reticulation events such as hybrid ...
s, seen as a tree augmented with transfer edges, which can be used to constrain transfers in a DTL model. Networks can also be used to model introgression
Introgression, also known as introgressive hybridization, in genetics is the transfer of genetic material from one species into the gene pool of another by the repeated backcrossing of an interspecific hybrid with one of its parent species. Intro ...
and incomplete lineage sorting
Incomplete lineage sorting, also termed hemiplasy, deep coalescence, retention of ancestral polymorphism, or trans-species polymorphism, describes a phenomenon in population genetics when ancestral gene copies fail to coalesce (looking backwards i ...
.[
]
Detecting coevolution in multiple pairs of levels
It is a central question to understand the evolution of a holobiont to know what the levels are that coevolve with each other, for instance between host species, host genes, symbionts and symbiont genes. It is possible to approach the multiple inter-dependencies between all levels of evolution by multiple pairwise comparisons of two evolving entities.
Reconciliation of host and symbiont on one side and geography and symbiont on the other can also help to identify patterns of diversification of host and symbiont that reflect either coevolution or patterns that can be explained by a common geographical diversification.
Similarly, a study used reconciliation methods to differentiate the effect of diet evolution and phylogenetic inertia Phylogenetic inertia or phylogenetic constraint refers to the limitations on the future evolutionary pathways that have been imposed by previous adaptations.
Charles Darwin first recognized this phenomenon, though the term was later coined by Hube ...
on the composition of mammalian gut microbiomes. By reconstructing ancestral diets and microbiome composition onto a mammalian phylogeny, the study revealed that both effects contribute but at different time scales.
Explicit modeling of three or more levels
In a model of a multi-level system as host/symbiont/genes, horizontal gene transfers should be more likely between two symbionts of a same host. This is invisible to a two-level gene tree/species tree or host/symbiont reconciliation: in some cases, looking at any combination of two levels can lead to missing an evolutionary scenario which can only be the most likely if the information from the three trees is considered together.
Trying to face the limitation of these uses of standard two-level reconciliations with systems involving inter-dependencies at multiple levels, a methodological effort has been undertaken in the last decade to construct and use multi-level models. This requires the identification of at least one "intermediate" level between the upper and the lower one.
File:Phylogenetic reconciliation upper aware.svg, Two levels can be reconciled with the constraint of an upper one, for instance host and symbiont with geography.
File:Phylogenetic reconciliation intermediate character.svg, Characters can evolve on reconciled phylogenies, like gene synteny on a gene/species reconciliation.
File:Phylogenetic reconciliation inter intra transfer.svg, Transfers can be upper dependent, which is more likely between two intermediate entities that belong to a same upper one.
File:Phylogenetic reconciliation intermediate inference.svg, With 3-levels reconciliation models, the intermediate tree can be inferred from the lower and upper trees.
Pre-reconciliation: characters onto reconciled trees
A first step towards integrated three-level models is to consider phylogenetic trees at two levels and another level represented only with characters at the leaves of one of the trees.
For instance, a reconciliation of host and symbiont phylogenies can be informed by geographic data. Ancestral geographic locations of host and symbiont species obtained through a character inference method can then be used to constrain the host/symbiont reconciliation: ancestral hosts and symbionts can only be associated if they belong to the same geographical location.
At another scale, the evolution at the sub-gene level can be approached with a character method. Here, parts of genes (e.g. the sequence coding for protein domains) is reconciled according to a DL model with a species tree, and the genes they belong to are mentioned as characters of these parts. Ancestral genes are then reconstructed a posteriori via merge and splits of gene parts.
Two-level reconciliations informed by a third level
As pointed out by several studies mentioned in , an upper level can inform a reconciliation between an intermediate and lower one, notably for horizontal transfers.
Three-level models can take into account these assumptions to guide reconciliations between an intermediate tree and lower levels with the knowledge of an upper tree. The model can for example give higher likelihoods to reconciliation scenarios where horizontal gene transfers happen between entities sharing the same habitat
In ecology, the term habitat summarises the array of resources, physical and biotic factors that are present in an area, such as to support the survival and reproduction of a particular species. A species habitat can be seen as the physical ...
. This has been achieved for the first time with DTL gene/species reconciliations nested with a DTL gene domain and gene reconciliation. Different costs for inter and intra transfers depend on whether or not transfers happen between genes of the same genomes.
Note that this model explicitly considers three levels and three trees, but does not yet define a real three-level reconciliation, with a likelihood or score associated.[ It relies on a sequential operation, where the second reconciliation is informed by the result of the first one.
]
The reconciliation problem in multi-level models
The next step is to define the score of a reconciliation consisting of three nested trees and to compute, given the three trees, three-level reconciliations according to their score. It has been achieved with a species/gene/domain system, where genes evolve within the species tree with a DL model and domains evolve within the gene/species system with a DTL model, forbidding domain transfers between genes of two different species. Inference involves candidate scenarios with joint scores. Computing the minimum score scenario is NP-hard, but dynamic programming or integer linear programming can offer heuristics. Variations of the problem considering multiple domains are available, and so is a simulation framework.
Inferring the intermediate tree using models of 3-level lower/intermediate/upper reconciliation
Just like two-level reconciliation can be used to improve lower or upper phylogenies, or to help constructing them from aligned sequences, joint reconciliation models can be used in the same manner.
In this vein, a coupled gene/species DL, domain gene DL and gene sequence evolution model in a Bayesian framework improves the reconstruction of gene trees.
Software
Multiple pieces of software have been developed to implement the various models of reconciliation. The following table does not aim for exhaustiveness but presents a number of software tools aimed at reconciling trees to infer reconciliation scenarios or for related usage, such as correcting or inferring trees, or testing coevolution.
The levels of interest section details the levels for which the software was implemented, even though it is entirely possible, for instance, to use a software made for species and gene reconciliation to reconcile host and symbionts. Parsimony or probability is the underlying model that is used for the reconciliation.
References
External links
*
{{Phylogenetics
Phylogenetics
Evolutionary biology
NP-complete problems