HOME

TheInfoList



OR:

The fixation index (FST) is a measure of
population differentiation Human genetic variation is the genetic differences in and among populations. There may be multiple variants of any given gene in the human population (alleles), a situation called polymorphism. No two humans are genetically identical. Even m ...
due to
genetic structure Genetic structure refers to any pattern in the genetic makeup of individuals within a population. Genetic structure allows for information about an individual to be inferred from other members of the same population. In trivial terms, all popul ...
. It is frequently estimated from
genetic polymorphism A gene is said to be polymorphic if more than one allele occupies that gene's locus within a population. In addition to having more than one allele at a specific locus, each allele must also occur in the population at a rate of at least 1% to ge ...
data, such as
single-nucleotide polymorphism In genetics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently ...
s (SNP) or
microsatellite A microsatellite is a tract of repetitive DNA in which certain DNA motifs (ranging in length from one to six or more base pairs) are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organism's genome. ...
s. Developed as a special case of Wright's
F-statistics In population genetics, ''F''-statistics (also known as fixation indices) describe the statistically expected level of heterozygosity in a population; more specifically the expected degree of (usually) a reduction in heterozygosity when compared ...
, it is one of the most commonly used statistics in
population genetics Population genetics is a subfield of genetics that deals with genetic differences within and between populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and po ...
.


Definition

Two of the most commonly used definitions for FST at a given locus are based on 1) the variance of allele frequencies between populations, and on 2) the probability of
Identity by descent A DNA segment is identical by state (IBS) in two or more individuals if they have identical nucleotide sequences in this segment. An IBS segment is identical by descent (IBD) in two or more individuals if they have inherited it from a common a ...
. If \bar is the average frequency of an allele in the total population, \sigma^2_S is the variance in the frequency of the allele between different subpopulations, weighted by the sizes of the subpopulations, and \sigma^2_T is the variance of the allelic state in the total population, FST is defined as : F_ = \frac = \frac Wright's definition illustrates that FST measures the amount of genetic variance that can be explained by population structure. This can also be thought of as the fraction of total diversity that is not a consequence of the average diversity within subpopulations, where diversity is measured by the probability that two randomly selected alleles are different, namely 2p(1-p). If the allele frequency in the ith population is p_i and the relative size of the ith population is c_i, then : F_ = \frac = \frac Alternatively, : F_ = \frac where f_0 is the probability of identity by descent of two individuals given that the two individuals are in the same subpopulation, and \bar is the probability that two individuals from the total population are identical by descent. Using this definition, FST can be interpreted as measuring how much closer two individuals from the same subpopulation are, compared to the total population. If the
mutation rate In genetics, the mutation rate is the frequency of new mutations in a single gene or organism over time. Mutation rates are not constant and are not limited to a single type of mutation; there are many different types of mutations. Mutation rates ...
is small, this interpretation can be made more explicit by linking the probability of identity by descent to coalescent times: Let T0 and T denote the average time to coalescence for individuals from the same subpopulation and the total population, respectively. Then, : F_ \approx 1-\frac This formulation has the advantage that the expected time to coalescence can easily be estimated from genetic data, which led to the development of various estimators for FST.


Estimation

In practice, none of the quantities used for the definitions can be easily measured. As a consequence, various estimators have been proposed. A particularly simple estimator applicable to DNA sequence data is: : F_ = \frac where \pi_\text and \pi_\text represent the average number of pairwise differences between two individuals sampled from different sub-populations ( \pi_\text ) or from the same sub-population ( \pi_\text). The average pairwise difference within a population can be calculated as the sum of the pairwise differences divided by the number of pairs. However, this estimator is biased when sample sizes are small or if they vary between populations. Therefore, more elaborate methods are used to compute FST in practice. Two of the most widely used procedures are the estimator by Weir & Cockerham (1984), or performing an Analysis of molecular variance. A list of implementations is available at the end of this article.


Interpretation

This comparison of genetic variability within and between populations is frequently used in applied
population genetics Population genetics is a subfield of genetics that deals with genetic differences within and between populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and po ...
. The values range from 0 to 1. A zero value implies complete
panmixis Panmixia (or panmixis) means random mating. A panmictic population is one where all individuals are potential partners. This assumes that there are no mating restrictions, neither genetic nor behavioural, upon the population and that therefore all ...
; that is, that the two populations are interbreeding freely. A value of one implies that all genetic variation is explained by the population structure, and that the two populations do not share any genetic diversity. For idealized models such as Wright's finite island model, FST can be used to estimate migration rates. Under that model, the migration rate is : M = \frac \approx\frac\left (\frac -1 \right ) , where is the migration rate per generation, and \mu is the mutation rate per generation. The interpretation of FST can be difficult when the data analyzed are highly polymorphic. In this case, the probability of identity by descent is very low and FST can have an arbitrarily low upper bound, which might lead to misinterpretation of the data. Also, strictly speaking FST is not a
distance Distance is a numerical or occasionally qualitative measurement of how far apart objects or points are. In physics or everyday usage, distance may refer to a physical length or an estimation based on other criteria (e.g. "two counties over"). ...
in the mathematical sense, as it does not satisfy the
triangle inequality In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side. This statement permits the inclusion of degenerate triangles, but ...
. For populations of
plants Plants are predominantly photosynthetic eukaryotes of the kingdom Plantae. Historically, the plant kingdom encompassed all living things that were not animals, and included algae and fungi; however, all current definitions of Plantae exclude ...
which clearly belong to the same
species In biology, a species is the basic unit of classification and a taxonomic rank of an organism, as well as a unit of biodiversity. A species is often defined as the largest group of organisms in which any two individuals of the appropriat ...
, values of FST greater than 15% are considered "great" or "significant" differentiation, while values below 5% are considered "small" or "insignificant" differentiation. Values for
mammal Mammals () are a group of vertebrate animals constituting the class Mammalia (), characterized by the presence of mammary glands which in females produce milk for feeding (nursing) their young, a neocortex (a region of the brain), fur ...
populations between
subspecies In biological classification, subspecies is a rank below species, used for populations that live in different areas and vary in size, shape, or other physical characteristics ( morphology), but that can successfully interbreed. Not all specie ...
, or closely related species, typical values are of the order of 5% to 20%. FST between the
Eurasian Eurasia (, ) is the largest continental area on Earth, comprising all of Europe and Asia. Primarily in the Northern and Eastern Hemispheres, it spans from the British Isles and the Iberian Peninsula in the west to the Japanese archipela ...
and North American populations of the
gray wolf The wolf (''Canis lupus''; : wolves), also known as the gray wolf or grey wolf, is a large canine native to Eurasia and North America. More than thirty subspecies of ''Canis lupus'' have been recognized, and gray wolves, as popularly un ...
were reported at 9.9%, those between the
Red wolf The red wolf (''Canis rufus'') is a canine native to the southeastern United States. Its size is intermediate between the coyote (''Canis latrans'') and gray wolf (''Canis lupus''). The red wolf's taxonomic classification as being a separate s ...
and
Gray wolf The wolf (''Canis lupus''; : wolves), also known as the gray wolf or grey wolf, is a large canine native to Eurasia and North America. More than thirty subspecies of ''Canis lupus'' have been recognized, and gray wolves, as popularly un ...
populations at between 17% and 18%. The
Eastern wolf The eastern wolf (''Canis lycaon'' or ''Canis lupus lycaon'' or ''Canis rufus lycaon'') also known as the timber wolf, Algonquin wolf or eastern timber wolf, is a canine of debated taxonomy native to the Great Lakes region and southeastern Canad ...
, a recently recognized highly admixed "wolf-like species" has values of FST below 10% in comparison with both Eurasian (7.6%) and North American gray wolves (5.7%), with the Red wolf (8.5%), and an even lower value when paired with the
Coyote The coyote (''Canis latrans'') is a species of canine native to North America. It is smaller than its close relative, the wolf, and slightly smaller than the closely related eastern wolf and red wolf. It fills much of the same ecological nich ...
(4.5%).


FST in humans

FST values depend strongly on the choice of populations. Closely related ethnic groups, such as the
Danes Danes ( da, danskere, ) are a North Germanic ethnic group and nationality native to Denmark and a modern nation identified with the country of Denmark. This connection may be ancestral, legal, historical, or cultural. Danes generally regard t ...
vs. the
Dutch Dutch commonly refers to: * Something of, from, or related to the Netherlands * Dutch people () * Dutch language () Dutch may also refer to: Places * Dutch, West Virginia, a community in the United States * Pennsylvania Dutch Country People E ...
, or the Portuguese vs. the
Spaniards Spaniards, or Spanish people, are a Romance ethnic group native to Spain. Within Spain, there are a number of national and regional ethnic identities that reflect the country's complex history, including a number of different languages, both ...
show values significantly below 1%, indistinguishable from panmixia. Within Europe, the most divergent ethnic groups have been found to have values of the order of 7% ( Lapps vs.
Sardinians The Sardinians, or Sards ( sc, Sardos or ; Italian and Sassarese: ''Sardi''; Gallurese: ''Saldi''), are a Romance language-speaking ethnic group native to Sardinia, from which the western Mediterranean island and autonomous region of Italy de ...
). Larger values are found if highly divergent homogenous groups are compared: the highest such value found was at close to 46%, between
Mbuti The Mbuti people, or Bambuti, are one of several indigenous pygmy groups in the Congo region of Africa. Their languages are Central Sudanic languages and Bantu languages. Subgroups Bambuti are pygmy hunter-gatherers, and are one of the old ...
and
Papuans The indigenous peoples of West Papua in Indonesia and Papua New Guinea, commonly called Papuans, are Melanesians. There is genetic evidence for two major historical lineages in New Guinea and neighboring islands: a first wave from the Malay Arch ...
.


Autosomal genetic distances based on classical markers

In their study ''The History and Geography of Human Genes (1994)'', Cavalli-Sforza, Menozzi and Piazza provide some of the most detailed and comprehensive estimates of genetic distances between human populations, within and across continents. Their initial database contains 76,676 gene frequencies (using 120 blood polymorphisms), corresponding to 6,633 samples in different locations. By culling and pooling such samples, they restrict their analysis to 491 populations. They focus on ''aboriginal populations'' that were at their present location at the end of the 15th century when the great European migrations began. When studying genetic difference at the world level, the number is reduced to 42 representative populations, aggregating subpopulations characterized by a high level of genetic similarity. For these 42 populations, Cavalli-Sforza and coauthors report bilateral distances computed from 120 alleles. Among this set of 42 world populations, the greatest genetic distance observed is between Mbuti Pygmies and Papua New Guineans, where the Fst distance is 0.4573, while the smallest genetic distance (0.0021) is between the Danish and the English. When considering more disaggregated data for 26 European populations, the smallest genetic distance (0.0009) is between the Dutch and the Danes, and the largest (0.0667) is between the Lapps and the Sardinians. The mean genetic distance among the 861 available pairings of the 42 selected populations was found to be 0.1338.. A genetic distance of 0.1338 implies that kinship between unrelated individuals of the same ancestry relative to the world population is equivalent to kinship between half siblings in a randomly mating population. This also implies that if a human from a given ancestral population has a mixed half-sibling, that human is closer genetically to an unrelated individual of their ancestral population than to their mixed half-sibling.


Autosomal genetic distances based on SNPs

A 2012 study based on
International HapMap Project The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease ...
data estimated FST between the three major "continental" populations of
Europeans Europeans are the focus of European ethnology, the field of anthropology related to the various ethnic groups that reside in the states of Europe. Groups may be defined by common genetic ancestry, common language, or both. Pan and Pfeil (20 ...
(combined from Utah residents of Northern and Western European ancestry from the CEPH collection and Italians from Tuscany), East Asians (combining Han Chinese from Beijing, Chinese from metropolitan Denver and Japanese from Tokyo, Japan) and
Sub-Saharan Africans Sub-Saharan Africa is, geographically, the area and regions of the continent of Africa that lies south of the Sahara. These include West Africa, East Africa, Central Africa, and Southern Africa. Geopolitically, in addition to the African co ...
(combining
Luhya Luhya or Abaluyia may refer to: * Luhya people * Luhya language Luhya (; also Luyia, Luhia or Luhiya) is a Bantu language of western Kenya. Dialects The various Luhya tribes speak several related languages and dialects, though some of them ar ...
of Webuye, Kenya, Maasai of Kinyawa, Kenya and
Yoruba The Yoruba people (, , ) are a West African ethnic group that mainly inhabit parts of Nigeria, Benin, and Togo. The areas of these countries primarily inhabited by Yoruba are often collectively referred to as Yorubaland. The Yoruba constitute ...
of Ibadan, Nigeria). It reported a value close to 12% between continental populations, and values close to
panmixia Panmixia (or panmixis) means random mating. A panmictic population is one where all individuals are potential partners. This assumes that there are no mating restrictions, neither genetic nor behavioural, upon the population and that therefore all ...
(smaller than 1%) within continental populations.


Programs for calculating FST

* Arlequin * Fstat
SMOGD

diveRsity
(R package)
hierfstat
(R package)
FinePop
ref> (R package)



* DnaSP


Modules for calculating FST

*
BioPerl BioPerl is a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications. It has played an integral role in the Human Genome Project. Background BioPerl is an active open source software project sup ...

BioPython


References


Further reading

* Evolution and the Genetics of Populations Volume 2: the Theory of Gene Frequencies, pg 294–295, S. Wright, Univ. of Chicago Press, Chicago, 1969 * A haplotype map of the human genome, The International HapMap Consortium, Nature 2005


See also

*
Genetic distance Genetic distance is a measure of the genetic divergence between species or between populations within a species, whether the distance measures time from common ancestor or degree of differentiation. Populations with many similar alleles have s ...


External links


BioPerl - Bio::PopGen::PopStats
{{DEFAULTSORT:Fixation Index Population genetics Mathematical and theoretical biology