Viral phylodynamics is defined as the study of how
epidemiological
Epidemiology is the study and analysis of the distribution (who, when, and where), patterns and determinants of health and disease conditions in a defined population.
It is a cornerstone of public health, and shapes policy decisions and evid ...
,
immunological
Immunology is a branch of medicineImmunology for Medical Students, Roderick Nairn, Matthew Helbert, Mosby, 2007 and biology that covers the medical study of immune systems in humans, animals, plants and sapient species. In such we can see the ...
, and
evolution
Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation ...
ary processes act and potentially interact to shape
viral phylogenies
A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram or a tree showing the evolutionary relationships among various biological spec ...
.
Since the coining of the term in 2004, research on viral phylodynamics has focused on transmission dynamics in an effort to shed light on how these dynamics impact viral genetic variation. Transmission dynamics can be considered at the level of cells within an infected host, individual hosts within a population, or entire populations of hosts.
Many viruses, especially
RNA virus
An RNA virus is a virusother than a retrovirusthat has ribonucleic acid ( RNA) as its genetic material. The nucleic acid is usually single-stranded RNA (ssRNA) but it may be double-stranded (dsRNA). Notable human diseases caused by RNA virus ...
es, rapidly accumulate genetic variation because of short
generation time
In population biology and demography, generation time is the average time between two consecutive generations in the lineages of a population. In human populations, generation time typically ranges from 22 to 33 years. Historians sometimes use this ...
s and high
mutation rate
In genetics, the mutation rate is the frequency of new mutations in a single gene or organism over time. Mutation rates are not constant and are not limited to a single type of mutation; there are many different types of mutations. Mutation rates ...
s.
Patterns of viral genetic variation are therefore heavily influenced by how quickly
transmission
Transmission may refer to:
Medicine, science and technology
* Power transmission
** Electric power transmission
** Propulsion transmission, technology allowing controlled application of power
*** Automatic transmission
*** Manual transmission
** ...
occurs and by which entities transmit to one another.
Patterns of viral genetic variation will also be affected by
selection
Selection may refer to:
Science
* Selection (biology), also called natural selection, selection in evolution
** Sex selection, in genetics
** Mate selection, in mating
** Sexual selection in humans, in human sexuality
** Human mating strateg ...
acting on viral phenotypes.
Although viruses can differ with respect to many phenotypes, phylodynamic studies have to date tended to focus on a limited number of viral phenotypes.
These include virulence phenotypes, phenotypes associated with viral transmissibility, cell or tissue tropism phenotypes, and antigenic phenotypes that can facilitate escape from
host immunity.
Due to the impact that transmission dynamics and selection can have on viral genetic variation, viral phylogenies can therefore be used to investigate important epidemiological, immunological, and evolutionary processes, such as
epidemic spread,
spatio-temporal dynamics including
metapopulation dynamics,
zoonotic transmission,
tissue tropism,
and
antigenic drift
Antigenic drift is a kind of genetic variation in viruses, arising from the accumulation of mutations in the virus genes that code for virus-surface proteins that host antibodies recognize. This results in a new strain of virus particles that is ...
.
The quantitative investigation of these processes through the consideration of viral phylogenies is the central aim of viral phylodynamics.
Sources of phylodynamic variation
In coining the term ''phylodynamics'', Grenfell and coauthors
[ postulated that viral phylogenies "... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics".
Their study showcased three features of viral phylogenies, which may serve as ]rules of thumb
In English, the phrase ''rule of thumb'' refers to an approximate method for doing something, based on practical experience rather than theory. This usage of the phrase can be traced back to the 17th century and has been associated with various t ...
for identifying important epidemiological, immunological, and evolutionary processes influencing patterns of viral genetic variation.
; The relative lengths of internal versus external branches will be affected by changes in viral population size over time[
: Rapid expansion of a virus in a population will be reflected by a "star-like" tree, in which external branches are long relative to internal branches. Star-like trees arise because viruses are more likely to share a recent common ancestor when the population is small, and a growing population has an increasingly smaller population size towards the past. Compared to a phylogeny of an expanding virus, a phylogeny of a viral population that stays constant in size will have external branches that are shorter relative to branches on the interior of the tree. The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (exponential growth). The phylogeny of ]hepatitis B virus
''Hepatitis B virus'' (HBV) is a partially double-stranded DNA virus, a species of the genus '' Orthohepadnavirus'' and a member of the '' Hepadnaviridae'' family of viruses. This virus causes the disease hepatitis B.
Disease
Despite there b ...
instead reflects a viral population that has remained roughly consistent (constant size). Similarly, trees reconstructed from viral sequences isolated from chronically infected individuals can be used to gauge changes in viral population sizes within a host.
; The clustering of taxa
In biology, a taxon (back-formation from ''taxonomy''; plural taxa) is a group of one or more populations of an organism or organisms seen by taxonomists to form a unit. Although neither is required, a taxon is usually known by a particular nam ...
on a viral phylogeny will be affected by host population structure[
: Viruses within similar hosts, such as hosts that reside in the same geographic region, are expected to be more closely related genetically if transmission occurs more commonly between them. The phylogenies of ]measles
Measles is a highly contagious infectious disease caused by measles virus. Symptoms usually develop 10–12 days after exposure to an infected person and last 7–10 days. Initial symptoms typically include fever, often greater than , cough, ...
and rabies virus
Rabies virus, scientific name ''Rabies lyssavirus'', is a neurotropic virus that causes rabies in humans and animals. Rabies transmission can occur through the saliva of animals and less commonly through contact with human saliva. ''Rabies lys ...
illustrate viruses with spatially structured host population. These phylogenies stand in contrast to the phylogeny of human influenza
Influenza, commonly known as "the flu", is an infectious disease caused by influenza viruses. Symptoms range from mild to severe and often include fever, runny nose, sore throat, muscle pain, headache, coughing, and fatigue. These symptom ...
, which does not appear to exhibit strong spatial structure over extended periods of time. Clustering of taxa, when it occurs, is not necessarily observed at all scales, and a population that appears structured at some scale may appear panmictic
Panmixia (or panmixis) means random mating. A panmictic population is one where all individuals are potential partners. This assumes that there are no mating restrictions, neither genetic nor behavioural, upon the population and that therefore all ...
at another scale, for example at a smaller spatial scale. While spatial structure is the most commonly observed population structure in phylodynamic analyses, viruses may also have nonrandom admixture by attributes such as the age, race, and risk behavior. This is because viral transmission can preferentially occur between hosts sharing any of these attributes.
; Tree balance will be affected by selection
Selection may refer to:
Science
* Selection (biology), also called natural selection, selection in evolution
** Sex selection, in genetics
** Mate selection, in mating
** Sexual selection in humans, in human sexuality
** Human mating strateg ...
, most notably immune escape[
: The effect of directional selection on the shape of a viral phylogeny is exemplified by contrasting the trees of ]influenza virus
''Orthomyxoviridae'' (from Greek ὀρθός, ''orthós'' 'straight' + μύξα, ''mýxa'' 'mucus') is a family of negative-sense RNA viruses. It includes seven genera: ''Alphainfluenzavirus'', ''Betainfluenzavirus'', '' Gammainfluenzavirus'', ' ...
and HIV's surface proteins. The ladder-like phylogeny of influenza virus A/H3N2's hemagglutinin
In molecular biology, hemagglutinins (or ''haemagglutinin'' in British English) (from the Greek , 'blood' + Latin , 'glue') are receptor-binding membrane fusion glycoproteins produced by viruses in the '' Paramyxoviridae'' family. Hemagglutinins a ...
protein bears the hallmarks of strong directional selection, driven by immune escape (imbalanced tree). In contrast, a more balanced phylogeny may occur when a virus is not subject to strong immune selection or other source of directional selection. An example of this is the phylogeny of the HIV envelope protein inferred from sequences isolated from different individuals in a population (balanced tree). phylogenies of the HIVf envelope protein from chronically infected hosts resemble influenza's ladder-like tree. This highlights that the processes affecting viral genetic variation can differ across scales. Indeed, contrasting patterns of viral genetic variation within and between hosts has been an active topic in phylodynamic research since the field's inception.[
Although these three phylogenetic features are useful rules of thumb to identify epidemiological, immunological, and evolutionary processes that might be impacting viral genetic variation, there is growing recognition that the mapping between process and phylogenetic pattern can be many-to-one. For instance, although ladder-like trees could reflect the presence of directional selection, ladder-like trees could also reflect sequential genetic bottlenecks that might occur with rapid spatial spread, as in the case of rabies virus.] Because of this many-to-one mapping between process and phylogenetic pattern, research in the field of viral phylodynamics has sought to develop and apply quantitative methods to effectively infer process from reconstructed viral phylogenies (see Methods
Method ( grc, μέθοδος, methodos) literally means a pursuit of knowledge, investigation, mode of prosecuting such inquiry, or system. In recent centuries it more often means a prescribed process for completing a task. It may refer to:
*Scien ...
). The consideration of other data sources (e.g., incidence patterns) may aid in distinguishing between competing phylodynamic hypotheses. Combining disparate sources of data for phylodynamic analysis remains a major challenge in the field and is an active area of research.
Applications
Viral origins
Phylodynamic models may aid in dating epidemic and pandemic origins.
The rapid rate of evolution in viruses allows molecular clock
The molecular clock is a figurative term for a technique that uses the mutation rate of biomolecules to deduce the time in prehistory when two or more life forms diverged. The biomolecular data used for such calculations are usually nucleo ...
models to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus.
With the rate of evolution measured in real units of time, it is possible to infer the date of the most recent common ancestor
In biology and genetic genealogy, the most recent common ancestor (MRCA), also known as the last common ancestor (LCA) or concestor, of a set of organisms is the most recent individual from which all the organisms of the set are descended. The ...
(MRCA) for a set of viral sequences.
The age of the MRCA of these isolates is a lower bound; the common ancestor of the entire virus population must have existed earlier than the MRCA of the virus sample.
In April 2009, genetic analysis of 11 sequences of swine-origin H1N1 influenza suggested that the common ancestor existed at or before 12 January 2009.
This finding aided in making an early estimate of the basic reproduction number
In epidemiology, the basic reproduction number, or basic reproductive number (sometimes called basic reproduction ratio or basic reproductive rate), denoted R_0 (pronounced ''R nought'' or ''R zero''), of an infection is the expected number of ...
of the pandemic. Similarly, genetic analysis of sequences isolated from within an individual can be used to determine the individual's infection time.
Viral spread
Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means.
For example, assessment of from surveillance data requires careful control of the variation of the reporting rate and the intensity of surveillance.
Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provide a separate avenue for inference of .
Such approaches have been used to estimate in hepatitis C virus
The hepatitis C virus (HCV) is a small (55–65 nm in size), enveloped, positive-sense single-stranded RNA virus of the family '' Flaviviridae''. The hepatitis C virus is the cause of hepatitis C and some cancers such as liver cancer (hepatoc ...
and HIV.[
Additionally, differential transmission between groups, be they geographic-, age-, or risk-related, is very difficult to assess from surveillance data alone.
Phylogeographic models have the possibility of more directly revealing these otherwise hidden transmission patterns.]
Phylodynamic approaches have mapped the geographic movement of the human influenza virus and quantified the epidemic spread of rabies virus in North American raccoons.
However, nonrepresentative sampling may bias inferences of both and migration patterns.
Phylodynamic approaches have also been used to better understand viral transmission dynamics and spread within infected hosts. For example, phylodynamic studies have been used to infer the rate of viral growth within infected hosts and to argue for the occurrence of viral compartmentalization in hepatitis C infection.
Viral control efforts
Phylodynamic approaches can also be useful in ascertaining the effectiveness of viral control efforts, particularly for diseases with low reporting rates. For example, the genetic diversity of the DNA-based hepatitis B virus
''Hepatitis B virus'' (HBV) is a partially double-stranded DNA virus, a species of the genus '' Orthohepadnavirus'' and a member of the '' Hepadnaviridae'' family of viruses. This virus causes the disease hepatitis B.
Disease
Despite there b ...
declined in the Netherlands in the late 1990s, following the initiation of a vaccination program. This correlation was used to argue that vaccination was effective at reducing the prevalence of infection, although alternative explanations are possible.
Viral control efforts can also impact the rate at which virus populations evolve, thereby influencing phylogenetic patterns. Phylodynamic approaches that quantify how evolutionary rates change over time can therefore provide insight into the effectiveness of control strategies. For example, an application to HIV sequences within infected hosts showed that viral substitution rates dropped to effectively zero following the initiation of antiretroviral drug therapy. This decrease in substitution rates was interpreted as an effective cessation of viral replication following the commencement of treatment, and would be expected to lead to lower viral loads. This finding is especially encouraging because lower substitution rates are associated with slower progression to AIDS in treatment-naive patients.
Antiviral treatment
Antiviral drugs are a class of medication used for treating viral infections. Most antivirals target specific viruses, while a broad-spectrum antiviral is effective against a wide range of viruses. Unlike most antibiotics, antiviral drugs do n ...
also creates selective pressure for the evolution of drug resistance
Drug resistance is the reduction in effectiveness of a medication such as an antimicrobial or an antineoplastic in treating a disease or condition. The term is used in the context of resistance that pathogens or cancers have "acquired", that is ...
in virus populations, and can thereby affect patterns of genetic diversity. Commonly, there is a fitness trade-off
A trade-off (or tradeoff) is a situational decision that involves diminishing or losing one quality, quantity, or property of a set or design in return for gains in other aspects. In simple terms, a tradeoff is where one thing increases, and anot ...
between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals. Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of Oseltamivir resistance in influenza A/H1N1.
Methods
Most often, the goal of phylodynamic analyses is to make inferences of epidemiological processes from viral phylogenies.
Thus, most phylodynamic analyses begin with the reconstruction of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of substitution rates and the time of the MRCA using a molecular clock model.
For viruses, Bayesian phylogenetic methods are popular because of the ability to fit complex demographic scenarios while integrating out phylogenetic uncertainty.
Traditional evolutionary approaches directly utilize methods from computational phylogenetics
Computational phylogenetics is the application of computational algorithms, methods, and programs to phylogenetic and population genetics to assess hypotheses of selection and population structure without direct regard for epidemiological models.
For example,
* the magnitude of selection can be measured by comparing the rate of nonsynonymous substitution to the rate of synonymous substitution ( dN/dS);
* the population structure of the host population may be examined by calculation of F-statistics
In population genetics, ''F''-statistics (also known as fixation indices) describe the statistically expected level of heterozygosity in a population; more specifically the expected degree of (usually) a reduction in heterozygosity when compared ...
; and
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as Tajima's D.
However, such analyses were not designed with epidemiological inference in mind and it may be difficult to extrapolate from standard statistics to desired epidemiological quantities.
In an effort to bridge the gap between traditional evolutionary approaches and epidemiological models, several analytical methods have been developed to specifically address problems related to phylodynamics.
These methods are based on coalescent theory
Coalescent theory is a model of how alleles sampled from a population may have originated from a common ancestor. In the simplest case, coalescent theory assumes no recombination, no natural selection, and no gene flow or population structure, ...
, birth-death models, and simulation
A simulation is the imitation of the operation of a real-world process or system over time. Simulations require the use of Conceptual model, models; the model represents the key characteristics or behaviors of the selected system or proc ...
, and are used to more directly relate epidemiological parameters to observed viral sequences.
Coalescent theory and phylodynamics
Effective population size
The coalescent is a mathematical model that describes the ancestry of a sample of nonrecombining gene copies.
In modeling the coalescent process, time is usually considered to flow backwards from the present.
In a selectively neutral population of constant size and nonoverlapping generations (the Wright Fisher model),
the expected time for a sample of two gene copies to ''coalesce'' (i.e., find a common ancestor) is generations.
More generally, the waiting time for two members of a sample of gene copies to share a common ancestor is exponentially distributed, with rate
: .
This time interval is labeled , and at its end there are extant lineages remaining. These remaining lineages will coalesce at the rate after intervals .
This process can be simulated
A simulation is the imitation of the operation of a real-world process or system over time. Simulations require the use of models; the model represents the key characteristics or behaviors of the selected system or process, whereas the ...
by drawing exponential random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
s with rates until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval .
The expected waiting time to find the MRCA of the sample is the sum of the expected values of the internode intervals,
:
Two corollaries are :
* The time to the MRCA (TMRCA) of a sample is not unbounded in the sample size.
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is .
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. estimated the TMRCA for 74 HIV-1 subtype-B genetic sequences collected in North America to be 1968.
Assuming a constant population size, we expect the time back to 1968 to represent of the TMRCA of the North American virus population.
If the population size changes over time, the coalescent rate will also be a function of time.
Donnelley and Tavaré derived this rate for a time-varying population size under the assumption of constant birth rates:
: .
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: .
Very early in an epidemic, the virus population may be growing exponentially at rate , so that units of time in the past, the population will have size .
In this case, the rate of coalescence becomes
: .
This rate is small close to when the sample was collected (), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree. This is why rapidly growing populations yield trees with long tip branches.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the serial interval for a particular pathogen to estimate the basic reproduction number, .
The two may be linked by the following equation:[R M Anderson, R M May (1992) Infectious Diseases of Humans: Dynamics and Control. Oxford: Oxford University Press. 768 p.]
: .
For example, one of the first estimates of was for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 hemagglutinin
In molecular biology, hemagglutinins (or ''haemagglutinin'' in British English) (from the Greek , 'blood' + Latin , 'glue') are receptor-binding membrane fusion glycoproteins produced by viruses in the '' Paramyxoviridae'' family. Hemagglutinins a ...
sequences in combination with prior data about the infectious period for influenza.[
]
Compartmental models
Infectious disease epidemics are often characterized by highly nonlinear and rapid changes in the number of infected individuals and the effective population size of the virus. In such cases, birth rates are highly variable, which can diminish the correspondence between effective population size and the prevalence of infection. Many mathematical models have been developed in the field of mathematical epidemiology
Mathematical models can project how infectious diseases progress to show the likely outcome of an epidemic (including in plants) and help inform public health and plant health interventions. Models use basic assumptions or collected statistics ...
to describe the nonlinear time series of prevalence of infection and the number of susceptible hosts. A well studied example is the Susceptible-Infected-Recovered (SIR) system of differential equations
In mathematics, a differential equation is an equation that relates one or more unknown functions and their derivatives. In applications, the functions generally represent physical quantities, the derivatives represent their rates of change, an ...
, which describes the fractions of the population susceptible, infected, and recovered as a function of time:
: ,
: , and
: .
Here, is the per capita rate of transmission to susceptible hosts, and is the rate at which infected individuals recover, whereupon they are no longer infectious. In this case, the incidence of new infections per unit time is , which is analogous to the birth rate in classical population genetics models. The general formula for the rate of coalescence is:
: .
The ratio can be understood as arising from the probability that two lineages selected uniformly at random are both ancestral to the sample. This probability is the ratio of the number of ways to pick two lineages without replacement from the set of lineages and from the set of all infections: . Coalescent events will occur with this probability at the rate given by the incidence function .
For the simple SIR model, this yields
: .
This expression is similar to the Kingman coalescent rate, but is damped by the fraction susceptible .
Early in an epidemic, , so for the SIR model
: .
This has the same mathematical form as the rate in the Kingman coalescent, substituting . Consequently, estimates of effective population size based on the Kingman coalescent will be proportional to prevalence of infection during the early period of exponential growth of the epidemic.
When a disease is no longer exponentially growing but has become endemic, the rate of lineage coalescence can also be derived for the epidemiological model governing the disease's transmission dynamics. This can be done by extending the Wright Fisher model to allow for unequal offspring distributions. With a Wright Fisher generation taking units of time, the rate of coalescence is given by:
: ,
where the effective population size is the population size divided by the variance of the offspring distribution .[J Wakeley (2008) Coalescent Theory: an Introduction.'' USA: Roberts & Company] The generation time for an epidemiological model at equilibrium is given by the duration of infection and the population size is closely related to the equilibrium number of infected individuals. To derive the variance in the offspring distribution for a given epidemiological model, one can imagine that infected individuals can differ from one another in their infectivities, their contact rates, their durations of infection, or in other characteristics relating to their ability to transmit the virus with which they are infected. These differences can be acknowledged by assuming that the basic reproduction number is a random variable that varies across individuals in the population and that follows some continuous probability distribution. The mean and variance of these individual basic reproduction numbers,