A haplotype (
haploid
Ploidy () is the number of complete sets of chromosomes in a cell (biology), cell, and hence the number of possible alleles for Autosome, autosomal and Pseudoautosomal region, pseudoautosomal genes. Here ''sets of chromosomes'' refers to the num ...
genotype
The genotype of an organism is its complete set of genetic material. Genotype can also be used to refer to the alleles or variants an individual carries in a particular gene or genetic location. The number of alleles an individual can have in a ...
) is a group of
allele
An allele is a variant of the sequence of nucleotides at a particular location, or Locus (genetics), locus, on a DNA molecule.
Alleles can differ at a single position through Single-nucleotide polymorphism, single nucleotide polymorphisms (SNP), ...
s in an
organism
An organism is any life, living thing that functions as an individual. Such a definition raises more problems than it solves, not least because the concept of an individual is also difficult. Many criteria, few of them widely accepted, have be ...
that are inherited together from a single parent.
[
Many organisms contain genetic material (]DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
) which is inherited from two parents. Normally these organisms have their DNA organized in two sets of pairwise similar chromosome
A chromosome is a package of DNA containing part or all of the genetic material of an organism. In most chromosomes, the very long thin DNA fibers are coated with nucleosome-forming packaging proteins; in eukaryotic cells, the most import ...
s. The offspring gets one chromosome in each pair from each parent. A set of pairs of chromosomes is called diploid
Ploidy () is the number of complete sets of chromosomes in a cell, and hence the number of possible alleles for autosomal and pseudoautosomal genes. Here ''sets of chromosomes'' refers to the number of maternal and paternal chromosome copies, ...
and a set of only one half of each pair is called haploid. The haploid genotype (haplotype) is a genotype that considers the singular chromosomes rather than the pairs of chromosomes. It can be all the chromosomes from one of the parents or a minor part of a chromosome, for example a sequence of 9000 base pair
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
s or a small set of alleles.
Specific contiguous parts of the chromosome are likely to be inherited together and not be split by chromosomal crossover
Chromosomal crossover, or crossing over, is the exchange of genetic material during sexual reproduction between two homologous chromosomes' sister chromatids, non-sister chromatids that results in recombinant chromosomes. It is one of the fina ...
, a phenomenon called genetic linkage
Genetic linkage is the tendency of Nucleic acid sequence, DNA sequences that are close together on a chromosome to be inherited together during the meiosis phase of sexual reproduction. Two Genetic marker, genetic markers that are physically near ...
. As a result, identifying these statistical associations and a few alleles of a specific haplotype sequence can facilitate identifying ''all other such'' polymorphic sites that are nearby on the chromosome ( imputation). Such information is critical for investigating the genetics of common diseases
A disease is a particular abnormal condition that adversely affects the structure or function of all or part of an organism and is not immediately due to any external injury. Diseases are often known to be medical conditions that are asso ...
; which have been investigated in humans by the International HapMap Project
The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease ...
.
Other parts of the genome are almost always haploid and do not undergo crossover: for example, human mitochondrial DNA
Mitochondrial DNA (mtDNA and mDNA) is the DNA located in the mitochondrion, mitochondria organelles in a eukaryotic cell that converts chemical energy from food into adenosine triphosphate (ATP). Mitochondrial DNA is a small portion of the D ...
is passed down through the maternal line and the Y chromosome
The Y chromosome is one of two sex chromosomes in therian mammals and other organisms. Along with the X chromosome, it is part of the XY sex-determination system, in which the Y is the sex-determining chromosome because the presence of the ...
is passed down the paternal line. In these cases, the entire sequence can be grouped into a simple evolutionary tree, with each branch founded by a unique-event polymorphism mutation (often, but not always, a single-nucleotide polymorphism
In genetics and bioinformatics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a ...
(SNP)). Each clade
In biology, a clade (), also known as a Monophyly, monophyletic group or natural group, is a group of organisms that is composed of a common ancestor and all of its descendants. Clades are the fundamental unit of cladistics, a modern approach t ...
under a branch, containing haplotypes with a single shared ancestor, is called a haplogroup
A haplotype is a group of alleles in an organism that are inherited together from a single parent, and a haplogroup (haploid from the , ''haploûs'', "onefold, simple" and ) is a group of similar haplotypes that share a common ancestor with a sing ...
.[
]
Haplotype resolution
An organism's genotype
The genotype of an organism is its complete set of genetic material. Genotype can also be used to refer to the alleles or variants an individual carries in a particular gene or genetic location. The number of alleles an individual can have in a ...
may not define its haplotype uniquely. For example, consider a diploid
Ploidy () is the number of complete sets of chromosomes in a cell, and hence the number of possible alleles for autosomal and pseudoautosomal genes. Here ''sets of chromosomes'' refers to the number of maternal and paternal chromosome copies, ...
organism and two bi-allelic loci (such as SNPs
In genetics and bioinformatics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in ...
) on the same chromosome. Assume the first locus has alleles ''A'' or ''T'' and the second locus ''G'' or ''C''. Both loci, then, have three possible genotypes: (''AA'', ''AT'', and ''TT'') and (''GG'', ''GC'', and ''CC''), respectively. For a given individual, there are nine possible configurations (haplotypes) at these two loci (shown in the Punnett square below). For individuals who are homozygous at one or both loci, the haplotypes are unambiguous - meaning that there is not any differentiation of haplotype T1T2 vs haplotype T2T1; where T1 and T2 are labeled to show that they are the same locus, but labeled as such to show it does not matter which order you consider them in, the end result is two T loci. For individuals heterozygous
Zygosity (the noun, zygote, is from the Greek "yoked," from "yoke") () is the degree to which both copies of a chromosome or gene have the same genetic sequence. In other words, it is the degree of similarity of the alleles in an organism.
Mos ...
at both loci, the gametic phase
A haplotype (haploid genotype) is a group of alleles in an organism that are inherited together from a single parent.
Many organisms contain genetic material (DNA) which is inherited from two parents. Normally these organisms have their DNA orga ...
is ambiguous
Ambiguity is the type of meaning in which a phrase, statement, or resolution is not explicitly defined, making for several interpretations; others describe it as a concept or statement that has no real reference. A common aspect of ambiguit ...
- in these cases, an observer does not know which haplotype the individual has, e.g., TA vs AT.
The only unequivocal method of resolving phase ambiguity is by sequencing
In genetics and biochemistry, sequencing means to determine the primary structure (sometimes incorrectly called the primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succ ...
. However, it is possible to estimate the probability of a particular haplotype when phase is ambiguous using a sample of individuals.
Given the genotypes for a number of individuals, the haplotypes can be inferred by haplotype resolution or haplotype phasing techniques. These methods work by applying the observation that certain haplotypes are common in certain genomic regions. Therefore, given a set of possible haplotype resolutions, these methods choose those that use fewer different haplotypes overall. The specifics of these methods vary - some are based on combinatorial approaches (e.g., parsimony), whereas others use likelihood functions based on different models and assumptions such as the Hardy–Weinberg principle
In population genetics, the Hardy–Weinberg principle, also known as the Hardy–Weinberg equilibrium, model, theorem, or law, states that Allele frequency, allele and genotype frequencies in a population will remain constant from generation ...
, the coalescent theory
Coalescent theory is a Scientific modelling, model of how alleles sampled from a population may have originated from a most recent common ancestor, common ancestor. In the simplest case, coalescent theory assumes no genetic recombination, recombina ...
model, or perfect phylogeny. The parameters in these models are then estimated using algorithms such as the expectation-maximization algorithm (EM), Markov chain Monte Carlo
In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that ...
(MCMC), or hidden Markov model
A hidden Markov model (HMM) is a Markov model in which the observations are dependent on a latent (or ''hidden'') Markov process (referred to as X). An HMM requires that there be an observable process Y whose outcomes depend on the outcomes of X ...
s (HMM).
Microfluidic whole genome haplotyping is a technique for the physical separation of individual chromosomes from a metaphase
Metaphase ( and ) is a stage of mitosis in the eukaryotic cell cycle in which chromosomes are at their second-most condensed and coiled stage (they are at their most condensed in anaphase). These chromosomes, carrying genetic information, alig ...
cell followed by direct resolution of the haplotype for each allele.
Gametic phase
In genetics
Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinians, Augustinian ...
, a gametic phase represents the original allelic combinations that a diploid
Ploidy () is the number of complete sets of chromosomes in a cell, and hence the number of possible alleles for autosomal and pseudoautosomal genes. Here ''sets of chromosomes'' refers to the number of maternal and paternal chromosome copies, ...
individual inherits from both parents. It is therefore a particular association of alleles
An allele is a variant of the sequence of nucleotides at a particular location, or locus, on a DNA molecule.
Alleles can differ at a single position through single nucleotide polymorphisms (SNP), but they can also have insertions and deletions ...
at different loci on the same chromosome
A chromosome is a package of DNA containing part or all of the genetic material of an organism. In most chromosomes, the very long thin DNA fibers are coated with nucleosome-forming packaging proteins; in eukaryotic cells, the most import ...
. Gametic phase is influenced by genetic linkage
Genetic linkage is the tendency of Nucleic acid sequence, DNA sequences that are close together on a chromosome to be inherited together during the meiosis phase of sexual reproduction. Two Genetic marker, genetic markers that are physically near ...
.
Y-DNA haplotypes from genealogical DNA tests
Unlike other chromosomes, Y chromosomes generally do not come in pairs. Every human male (excepting those with XYY syndrome
XYY syndrome, also known as Jacobs syndrome, is an aneuploid genetic condition in which a male has an extra Y chromosome. There are usually few symptoms. These may include being taller than average and an increased risk of learning disabiliti ...
) has only one copy of that chromosome. This means that there is not any chance variation of which copy is inherited, and also (for most of the chromosome) not any shuffling between copies by recombination; so, unlike autosomal
An autosome is any chromosome that is not a sex chromosome. The members of an autosome pair in a diploid cell have the same morphology, unlike those in allosomal (sex chromosome) pairs, which may have different structures. The DNA in autosome ...
haplotypes, there is effectively not any randomisation of the Y-chromosome haplotype between generations. A human male should largely share the same Y chromosome as his father, give or take a few mutations; thus Y chromosomes tend to pass largely intact from father to son,
with a small but accumulating number of mutations that can serve to differentiate male lineages.
In particular, the Y-DNA represented as the numbered results of a Y-DNA genealogical DNA test should match, except for mutations.
UEP results (SNP results)
Unique-event polymorphisms (UEPs) such as SNPs represent haplogroup
A haplotype is a group of alleles in an organism that are inherited together from a single parent, and a haplogroup (haploid from the , ''haploûs'', "onefold, simple" and ) is a group of similar haplotypes that share a common ancestor with a sing ...
s. STRs represent haplotypes. The results that comprise the full Y-DNA haplotype from the Y chromosome DNA test can be divided into two parts: the results for UEPs, sometimes loosely called the SNP results as most UEPs are single-nucleotide polymorphisms
In genetics and bioinformatics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in ...
, and the results for microsatellite
A microsatellite is a tract of repetitive DNA in which certain Sequence motif, DNA motifs (ranging in length from one to six or more base pairs) are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organ ...
short tandem repeat sequences (Y-STR
A Y-STR is a short tandem repeat (STR) on the Y-chromosome. Y-STRs are often used in forensics, paternity, and genealogical DNA testing.
Y-STRs are taken specifically from the male Y chromosome. These Y-STRs provide a weaker analysis than autoso ...
s).
The UEP results represent the inheritance of events it is believed can be assumed to have happened only once in all human history. These can be used to identify the individual's Y-DNA haplogroup, his place in the "family tree" of the whole of humanity. Different Y-DNA haplogroups identify genetic populations that are often distinctly associated with particular geographic regions; their appearance in more recent populations located in different regions represents the migrations tens of thousands of years ago of the direct patrilineal
Patrilineality, also known as the male line, the spear side or agnatic kinship, is a common kinship system in which an individual's family membership derives from and is recorded through their father's lineage. It generally involves the inheritanc ...
ancestors of current individuals.
Y-STR haplotypes
Genetic results also include the Y-STR haplotype, the set of results from the Y-STR markers tested.
Unlike the UEPs, the Y-STRs mutate much more easily, which allows them to be used to distinguish recent genealogy. But it also means that, rather than the population of descendants of a genetic event all sharing the ''same'' result, the Y-STR haplotypes are likely to have spread apart, to form a ''cluster'' of more or less similar results. Typically, this cluster will have a definite most probable center, the modal haplotype (presumably similar to the haplotype of the original founding event), and also a haplotype diversity — the degree to which it has become spread out. The further in the past the defining event occurred, and the more that subsequent population growth occurred early, the greater the haplotype diversity will be for a particular number of descendants. However, if the haplotype diversity is smaller for a particular number of descendants, this may indicate a more recent common ancestor, or a recent population expansion.
It is important to note that, unlike for UEPs, two individuals with a similar Y-STR haplotype may not necessarily share a similar ancestry. Y-STR events are not unique. Instead, the clusters of Y-STR haplotype results inherited from different events and different histories tend to overlap.
In most cases, it is a long time since the haplogroups' defining events, so typically the cluster of Y-STR haplotype results associated with descendants of that event has become rather broad. These results will tend to significantly overlap the (similarly broad) clusters of Y-STR haplotypes associated with other haplogroups. This makes it impossible for researchers to predict with absolute certainty to which Y-DNA haplogroup a Y-STR haplotype would point. If the UEPs are not tested, the Y-STRs may be used only to predict probabilities for haplogroup ancestry, but not certainties.
A similar scenario exists in trying to evaluate whether shared surnames indicate shared genetic ancestry. A cluster of similar Y-STR haplotypes may indicate a shared common ancestor, with an identifiable modal haplotype, but only if the cluster is sufficiently distinct from what may have happened by chance from different individuals who historically adopted the same name independently. Many names were adopted from common occupations, for instance, or were associated with habitation of particular sites. More extensive haplotype typing is needed to establish genetic genealogy. Commercial DNA-testing companies now offer their customers testing of more numerous sets of markers to improve definition of their genetic ancestry. The number of sets of markers tested has increased from 12 during the early years to 111 more recently.
Establishing plausible relatedness between different surnames data-mined from a database is significantly more difficult. The researcher must establish that the ''very nearest'' member of the population in question, chosen purposely from the population for that reason, would be unlikely to match by accident. This is more than establishing that a ''randomly selected'' member of the population is unlikely to have such a close match by accident. Because of the difficulty, establishing relatedness between different surnames as in such a scenario is likely to be impossible, except in special cases where there is specific information to drastically limit the size of the population of candidates under consideration.
Diversity
Haplotype diversity is a measure of the uniqueness of a particular haplotype in a given population. The haplotype diversity (H) is computed as:
where is the (relative) haplotype frequency of each haplotype in the sample and is the sample size. Haplotype diversity is given for each sample.
History
The term "haplotype" was first introduced by MHC biologist Ruggero Ceppellini during the Third International Histocompatibility Workshop to substitute "pheno-group".
See also
* Haplotype 35
* Haplotype estimation In genetics, haplotype estimation (also known as "phasing") refers to the process of statistical estimation of haplotypes from genotype data. The most common situation arises when genotypes are collected at a set of polymorphic sites from a group of ...
* International HapMap Project
The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease ...
* Genealogical DNA test
A genealogical DNA test is a DNA-based Genetic testing, genetic test used in genetic genealogy that looks at specific locations of a person's genome in order to find or verify ancestral genealogical relationships, or (with lower reliability) to ...
* Haplogroup
A haplotype is a group of alleles in an organism that are inherited together from a single parent, and a haplogroup (haploid from the , ''haploûs'', "onefold, simple" and ) is a group of similar haplotypes that share a common ancestor with a sing ...
* Y-STR
A Y-STR is a short tandem repeat (STR) on the Y-chromosome. Y-STRs are often used in forensics, paternity, and genealogical DNA testing.
Y-STRs are taken specifically from the male Y chromosome. These Y-STRs provide a weaker analysis than autoso ...
* PLINK (genetic tool-set)
*Haplogroup E-M215 (Y-DNA)
E-M215 or E1b1b, formerly known as E3b, is a major human Y-chromosome DNA haplogroup. E-M215 has two basal branches, Haplogroup E-M35, E-M35 and E-M281. E-M35 is primarily distributed in North Africa and the Horn of Africa, and occurs at moderat ...
References
External links
HapMap
— homepage for the International HapMap Project.
— the difference between haplogroup & haplotype explained.
{{Authority control
Classical genetics
Population genetics
Genetic genealogy