HOME

TheInfoList



OR:

In
population genetics Population genetics is a subfield of genetics that deals with genetic differences within and between populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and pop ...
, the Watterson estimator is a method for describing the
genetic diversity Genetic diversity is the total number of Genetics, genetic characteristics in the genetic makeup of a species, it ranges widely from the number of species to differences within species and can be attributed to the span of survival for a species. ...
in a population. It was developed by Margaret Wu and G. A. Watterson in the 1970s. It is estimated by counting the number of polymorphic sites. It is a measure of the "population mutation rate" (the product of the effective population size and the neutral mutation rate) from the observed nucleotide diversity of a population. \theta = 4N_e\mu, where N_e is the
effective population size The effective population size (''N'e'') is a number that, in some simplified scenarios, corresponds to the number of breeding individuals in the population. More generally, ''N'e'' is the number of individuals that an idealised population w ...
and \mu is the per-generation
mutation rate In genetics, the mutation rate is the frequency of new mutations in a single gene or organism over time. Mutation rates are not constant and are not limited to a single type of mutation; there are many different types of mutations. Mutation rates ...
of the population of interest ( ). The assumptions made are that there is a sample of n haploid individuals from the population of interest, that there are infinitely many sites capable of varying (so that mutations never overlay or reverse one another), and that n \ll N_e. Because the number of segregating sites counted will increase with the number of sequences looked at, the correction factor a_n is used. The estimate of \theta, often denoted as \widehat _w, is : \widehat _w = , where K is the number of segregating sites (an example of a segregating site would be a
single-nucleotide polymorphism In genetics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently lar ...
) in the sample and : a_n = \sum^_ is the (n-1)th
harmonic number In mathematics, the -th harmonic number is the sum of the reciprocals of the first natural numbers: H_n= 1+\frac+\frac+\cdots+\frac =\sum_^n \frac. Starting from , the sequence of harmonic numbers begins: 1, \frac, \frac, \frac, \frac, \do ...
. This estimate is based on
coalescent theory Coalescent theory is a model of how alleles sampled from a population may have originated from a common ancestor. In the simplest case, coalescent theory assumes no recombination, no natural selection, and no gene flow or population structu ...
. Watterson's estimator is commonly used for its simplicity. When its assumptions are met, the estimator is unbiased and the
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
of the estimator decreases with increasing sample size or recombination rate. However, the estimator can be biased by population structure. For example, \widehat_w is downwardly biased in an exponentially growing population. It can also be biased by violation of the infinite-sites mutational model; if multiple mutations can overwrite one another, Watterson's estimator will be biased downward. Comparing the value of the Watterson's estimator, to nucleotide diversity is the basis of Tajima's D which allows inference of the evolutionary regime of a given locus.


See also

* Tajima's D *
Coupon collector's problem In probability theory, the coupon collector's problem describes "collect all coupons and win" contests. It asks the following question: If each box of a brand of cereals contains a coupon, and there are ''n'' different types of coupons, what is th ...
*
Ewens sampling formula In population genetics, Ewens's sampling formula, describes the probabilities associated with counts of how many different alleles are observed a given number of times in the sample. Definition Ewens's sampling formula, introduced by Warren Ewen ...


References

* {{citation, first=G.A., last= Watterson , title=On the number of segregating sites in genetical models without recombination. , journal=Theoretical Population Biology, volume=7, year=1975, pages=256–276, doi=10.1016/0040-5809(75)90020-9, pmid=1145509, issue=2 * McVean, Gil; Awadalla, Philip; Fearnhead, Paul (2002) "A Coalescent-Based Method for Detecting and Estimating Recombination From Gene Sequences", ''Genetics'', 160, 1231–1241. Population genetics Statistical genetics