HOME

TheInfoList



OR:

In
population genetics Population genetics is a subfield of genetics that deals with genetic differences within and between populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as Adaptation (biology), adaptation, ...
, the allele frequency spectrum, sometimes called the site frequency spectrum, is the distribution of the
allele frequencies Allele frequency, or gene frequency, is the relative frequency of an allele (variant of a gene) at a particular locus in a population, expressed as a fraction or percentage. Specifically, it is the fraction of all chromosomes in the population that ...
of a given set of loci (often
SNPs In genetics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently larg ...
) in a population or sample. Because an allele frequency spectrum is often a summary of or compared to sequenced samples of the whole population, it is a histogram with size depending on the number of sequenced individual chromosomes. Each entry in the frequency spectrum records the total number of loci with the corresponding derived allele frequency. Loci contributing to the frequency spectrum are assumed to be independently changing in frequency. Furthermore, loci are assumed to be biallelic (that is, with exactly two alleles present), although extensions for multiallelic frequency spectra exist. Many
summary statistics In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. Statisticians commonly try to describe the observations in * a measure of ...
of observed genetic variation are themselves summaries of the allele frequency spectrum, including estimates of \theta such as Watterson's \theta_W and Tajima's \theta_\pi , Tajima's D, Fay and Wu's H and the fixation index F_ .


Example

The allele frequency spectrum from a sample of n chromosomes is calculated by counting the number of sites with derived allele frequencies 1 \leq i \leq n-1 . For example, consider a sample of n=6 individuals with eight observed variable sites. In this table, a 1 indicates that the derived allele is observed at that site, while a 0 indicates the ancestral allele was observed. The allele frequency spectrum can be written as the vector \mathbf = (x_1,x_2,x_3,x_4,x_5) , where x_i is the number of observed sites with derived allele frequency i . In this example, the observed allele frequency spectrum is (4,2,1,0,1) , due to four instances of a single observed derived allele at a particular SNP loci, two instances of two derived alleles, and so on.


Calculation

The expected allele frequency spectrum may be calculated using either a coalescent or
diffusion Diffusion is the net movement of anything (for example, atoms, ions, molecules, energy) generally from a region of higher concentration to a region of lower concentration. Diffusion is driven by a gradient in Gibbs free energy or chemica ...
approach. The demographic history of a population and
natural selection Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the heritable traits characteristic of a population over generations. Cha ...
affect allele frequency dynamics, and these effects are reflected in the shape of the allele frequency spectrum. For the simple case of selective neutral alleles segregating in a population that has reached demographic equilibrium (that is, without recent population size changes or gene flow), the expected allele frequency spectrum \mathbf = (x_1,\ldots,x_) for a sample of size n is given by : x_i = \theta \frac, where \theta = 2N\mu is the population scaled mutation rate. Deviations from demographic equilibrium or neutrality will change the shape of the expected frequency spectrum. Calculating the frequency spectrum from observed sequence data requires one to be able to distinguish the ancestral and derived (mutant) alleles, often by comparing to an outgroup sequence. For example in human population genetic studies, the homologous chimpanzee reference sequence is typically used to estimate the ancestral allele. However, sometimes the ancestral allele cannot be determined, in which case the folded allele frequency spectrum may be calculated instead. The folded frequency spectrum stores the observed counts of the minor (most rare) allele frequencies. The folded spectrum can be calculated by binning together the i th and (n-i) th entries from the unfolded spectrum, where n is the number of sampled individuals.


Multi-population allele frequency spectrum

The joint allele frequency spectrum (JAFS) is the joint distribution of allele frequencies across two or more related populations. The JAFS for d populations, with n_j sampled chromosomes in the j th population, is a d -dimensional histogram, in which each entry stores the total number of segregating sites in which the derived allele is observed with the corresponding frequency in each population. Each axis of the histogram corresponds to a population, and indices run from 0 \leq i \leq n_j for the j th population.


Example

Suppose we sequence diploid individuals from two populations, 4 individuals from population 1 and 2 individuals from population 2. The JAFS would be a 9\times5 matrix, indexed from zero. The ,2 entry would record the number of observed polymorphic loci with derived allele frequency 3 in population 1 and frequency 2 in population 2. The ,0 entry would record those loci with observed frequency 1 in population 1, and frequency 0 in population 2. The ,3 entry would record those loci with the derived allele fixed in population 1 (seen in all chromosomes), and with frequency 3 in population 2.


Applications

The shape of the allele frequency spectrum is sensitive to demography, such as population size changes, migration, and substructure, as well as natural selection. By comparing observed data summarized in a frequency spectrum to the expected frequency spectrum calculated under a given demographic and selection model, one can assess the
goodness of fit The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measure ...
of that the model to the data, and use
likelihood The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood functi ...
theory to estimate the best fit parameters of the model. For example, suppose a population experienced a recent period of exponential growth and n sample sequences were obtained from the population at the end of the growth and the observed (data) allele frequency spectrum was calculated using putatively neutral variation. The demographic model would have parameters for the exponential growth rate \rho , the time T for which the growth occurred, and a reference population size N_ , assuming that the population was at equilibrium when the growth began. The expected frequency spectrum for a given parameter set (\rho,T,N_) can be obtained using either diffusion or coalescent theory, and compared to the data frequency spectrum. The best fit parameters can be found using maximum likelihood. This approach has been used to infer demographic and selection models for many species, including humans. For example, Marth et al. (2004) used the single population allele frequency spectra for a group of Africans, Europeans, and Asians to show that population bottlenecks have occurred in the Asian and European demographic histories, but not in the Africans. More recently, Gutenkunst et al. (2009) used the joint allele frequency spectrum for these same three populations to infer the time at which the populations diverged and the amount of subsequent ongoing migration between them (see
out of Africa hypothesis In paleoanthropology, the recent African origin of modern humans, also called the "Out of Africa" theory (OOA), recent single-origin hypothesis (RSOH), replacement hypothesis, or recent African origin model (RAO), is the dominant model of the ...
). Additionally, these methods may be used to estimate patterns of selection from allele frequency data. For example, Boyko et al. (2008) inferred the
distribution of fitness effects In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitos ...
for newly arising mutations using human polymorphism data that controlled for the effects of non-equilibrium demography.


References

{{DEFAULTSORT:Allele frequency spectrum Population genetics