
A
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
is said to be polymorphic if more than one
allele
An allele is a variant of the sequence of nucleotides at a particular location, or Locus (genetics), locus, on a DNA molecule.
Alleles can differ at a single position through Single-nucleotide polymorphism, single nucleotide polymorphisms (SNP), ...
occupies that gene's
locus within a population. In addition to having more than one allele at a specific locus, each allele must also occur in the population at a rate of at least 1% to generally be considered polymorphic.
Gene polymorphisms can occur in any region of the genome. The majority of polymorphisms are silent, meaning they do not alter the function or expression of a gene. Some polymorphisms are visible. For example, in dogs the E locus can have any of five different alleles, known as E, E
m, E
g, E
h, and e. Varying combinations of these alleles contribute to the pigmentation and patterns seen in dog coats.
A polymorphic variant of a gene can lead to the abnormal expression or to the production of an abnormal form of the protein; this abnormality may cause or be associated with disease. For example, a polymorphic variant of the gene encoding the enzyme
CYP4A11, in which
thymidine
Thymidine (nucleoside#List of nucleosides and corresponding nucleobases, symbol dT or dThd), also known as deoxythymidine, deoxyribosylthymine, or thymine deoxyriboside, is a pyrimidine nucleoside, deoxynucleoside. Deoxythymidine is the DNA nuc ...
replaces
cytosine
Cytosine () (symbol C or Cyt) is one of the four nucleotide bases found in DNA and RNA, along with adenine, guanine, and thymine ( uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attac ...
at the gene's nucleotide 8590 position encodes a CYP4A11 protein that substitutes phenylalanine with serine at the protein's amino acid position 434.
This variant protein has reduced enzyme activity in metabolizing
arachidonic acid to the blood pressure-regulating
eicosanoid
Eicosanoids are lipid signaling, signaling molecules made by the enzymatic or non-enzymatic oxidation of arachidonic acid or other polyunsaturated fatty acids (PUFAs) that are, similar to arachidonic acid, around 20 carbon units in length. Eicosa ...
,
20-hydroxyeicosatetraenoic acid. A study has shown that humans bearing this variant in one or both of their CYP4A11 genes have an increased incidence of
hypertension
Hypertension, also known as high blood pressure, is a Chronic condition, long-term Disease, medical condition in which the blood pressure in the artery, arteries is persistently elevated. High blood pressure usually does not cause symptoms i ...
, ischemic
stroke
Stroke is a medical condition in which poor cerebral circulation, blood flow to a part of the brain causes cell death. There are two main types of stroke: brain ischemia, ischemic, due to lack of blood flow, and intracranial hemorrhage, hemor ...
, and
coronary artery disease
Coronary artery disease (CAD), also called coronary heart disease (CHD), or ischemic heart disease (IHD), is a type of cardiovascular disease, heart disease involving Ischemia, the reduction of blood flow to the cardiac muscle due to a build-up ...
.
Most notably, the genes coding for the
major histocompatibility complex
The major histocompatibility complex (MHC) is a large Locus (genetics), locus on vertebrate DNA containing a set of closely linked polymorphic genes that code for Cell (biology), cell surface proteins essential for the adaptive immune system. The ...
(MHC) are in fact the most polymorphic genes known. MHC molecules are involved in the immune system and interact with
T-cells
T cells (also known as T lymphocytes) are an important part of the immune system and play a central role in the adaptive immune response. T cells can be distinguished from other lymphocytes by the presence of a T-cell receptor (TCR) on their cell ...
. There are more than 32,000 different
alleles of human MHC class I and II genes, and it has been estimated that there are 200 variants at the HLA-B HLA-DRB1 loci alone.
Some polymorphism may be maintained by
balancing selection.
Differences between gene polymorphism and mutation
A rule of thumb that is sometimes used is to classify genetic variants that occur below 1% allele frequency as
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
s rather than polymorphisms. However, since polymorphisms may occur at low allele frequency, this is not a reliable way to tell new mutations from polymorphisms.
A mutation is a change to an inherited genetic sequence.
* In unicellular organisms, there isn't a distinction.
* In multi-cellular organisms which replicate via
sexual reproduction
Sexual reproduction is a type of reproduction that involves a complex life cycle in which a gamete ( haploid reproductive cells, such as a sperm or egg cell) with a single set of chromosomes combines with another gamete to produce a zygote tha ...
nearly all mutations are not passed on to subsequent generations. A mutation may, or may not, be passed on to off-spring (e.g. if is a mutation that happens in some replicating cells
that are not part of the
germline, none of the off-spring will bear the mutation.
** For example, a mutation may occur in a skin cell as a result of
ultraviolet light resulting in a
thiamine dimer which is not properly repaired before the skin cell undergoes
mitosis
Mitosis () is a part of the cell cycle in eukaryote, eukaryotic cells in which replicated chromosomes are separated into two new Cell nucleus, nuclei. Cell division by mitosis is an equational division which gives rise to genetically identic ...
and divides.
** This is quite distinct from a mutation which occurs during
meiosis
Meiosis () is a special type of cell division of germ cells in sexually-reproducing organisms that produces the gametes, the sperm or egg cells. It involves two rounds of division that ultimately result in four cells, each with only one c ...
, which can be subsequently passed on to future generations, and it is very helpful to be clear when discussing mutations whether it is a somatic mutation or germline mutation.
In the case of
silent mutations there isn't a change in fitness, and the pressures responsible for
Hardy-Weinberg equilibrium have no impact on the accumulation of silent polymorphisms
over time. Most often, a polymorphism is
variation in a single nucleotide (SNP), but also can be
insertion or deletion of one or more nucleotides, changes in the
number of times a short or
longer sequence is repeated (both of these are common in parts of DNA that don't directly code for a protein, as are SNPs, but can have major effects on
gene expression
Gene expression is the process (including its Regulation of gene expression, regulation) by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, ...
).
Polymorphisms which result in a change in fitness are the grist for the mill of
evolution by natural selection. All genetic polymorphisms start out as a mutation, but only if they are germline and are not
lethal can they spread into a population. Polymorphisms are classified based on what happens at the level of the individual mutation in the
DNA sequence
A nucleic acid sequence is a succession of bases within the nucleotides forming alleles within a DNA (using GACT) or RNA (GACU) molecule. This succession is denoted by a series of a set of five different letters that indicate the order of the nu ...
(or RNA sequence in the case of
RNA viruses), and what effect the mutation has on the
phenotype
In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology (physical form and structure), its developmental processes, its biochemical and physiological propert ...
(i.e. silent or resulting in some change in function or change in fitness). Polymorphisms are also classified based on whether the change is in the sequence of the
resulting protein or in the regulation of the
expression of the gene, which can occur at sites that are typically upstream and adjacent to the gene, but not always.
Identification
Polymorphisms can be identified in the laboratory using a variety of methods. Many methods employ
PCR to amplify the sequence of a gene. Once amplified, polymorphisms and mutations in the sequence can be detected by
DNA sequencing
DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, thymine, cytosine, and guanine. The ...
, either directly or after screening for variation with a method such as
single strand conformation polymorphism analysis.
Types
A polymorphism can be any sequence difference. Examples include:
*
Single nucleotide polymorphisms (SNPs) are a single nucleotide changes that happen in the genome in a particular location. The single nucleotide polymorphism is the most common form of
genetic variation
Genetic variation is the difference in DNA among individuals or the differences between populations among the same species. The multiple sources of genetic variation include mutation and genetic recombination. Mutations are the ultimate sources ...
.
*
Small-scale insertions/deletions (Indels) consist of insertions or deletions of bases in DNA.
* Polymorphic repetitive elements. Active
transposable elements can also cause polymorphism by inserting themselves in new locations. For example, repetitive elements of the
Alu and
LINE1 families cause polymorphisms in human genome.
*
Microsatellite
A microsatellite is a tract of repetitive DNA in which certain Sequence motif, DNA motifs (ranging in length from one to six or more base pairs) are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organ ...
s are repeats of 1-6 base pairs of DNA sequence. Microsatellites are commonly used as a molecular markers especially for identifying the relationship between alleles
Clinical significance
Many different human disease result from polymorphisms. Polymorphisms also play significant role as risk factors for development of disease. Finally, polymorphisms in
drug metabolism, esp.
cytochrome p450
Cytochromes P450 (P450s or CYPs) are a Protein superfamily, superfamily of enzymes containing heme as a cofactor (biochemistry), cofactor that mostly, but not exclusively, function as monooxygenases. However, they are not omnipresent; for examp ...
isoenzymes, proteins involved in drug transport (whether into the body, into protected areas of the body like the brain, or secreted out) as well as in specific
cell surface receptor
Cell surface receptors (membrane receptors, transmembrane receptors) are receptors that are embedded in the plasma membrane of cells. They act in cell signaling by receiving (binding to) extracellular molecules. They are specialized integra ...
proteins alter the effect of various drugs.
This is a rapidly evolving area of drug safety research. Resources such as
HapMap,
DbSNP,
Ensembl,
DNA Data Bank of Japan,
DrugBank,
Kyoto Encyclopedia of Genes and Genomes (KEGG),
GenBank
The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a par ...
, and other parts of the
International Nucleotide Sequence Database Collaboration The International Nucleotide Sequence Database Collaboration (INSDC) consists of a joint effort to collect and disseminate databases containing DNA and RNA sequences. It involves the following computerized databases: NIG's DNA Data Bank of Japan ( ...
have become crucial in
Personalized medicine,
bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
, and
pharmacogenomics.
Lung cancer
Polymorphisms have been discovered in multiple XPD exons. XPD refers to "
xeroderma pigmentosum group D" and is involved in a
DNA repair
DNA repair is a collection of processes by which a cell (biology), cell identifies and corrects damage to the DNA molecules that encode its genome. A weakened capacity for DNA repair is a risk factor for the development of cancer. DNA is cons ...
mechanism used during
DNA replication
In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all life, living organisms, acting as the most essential part of heredity, biolog ...
. XPD works by cutting and removing segments of DNA that have been damaged due to things such as cigarette smoking and inhalation of other environmental
carcinogen
A carcinogen () is any agent that promotes the development of cancer. Carcinogens can include synthetic chemicals, naturally occurring substances, physical agents such as ionizing and non-ionizing radiation, and biologic agents such as viruse ...
s. Asp312Asn and Lys751Gln are the two common polymorphisms of XPD that result in a change in a single amino acid. This variation in Asn and Gln alleles has been related to individuals having a reduced DNA repair efficiency.
Several studies have been conducted to see if this diminished capacity to repair DNA is related to an increased risk of lung cancer. These studies examined the XPD gene in lung cancer patients of varying age, gender, race, and
pack-years. The studies provided mixed results, from concluding individuals who are homozygous for the Asn allele or homozygous for the Gln allele had an increased risk of developing lung cancer, to finding no statistical significance between smokers who have either allele polymorphism and their
susceptibility to lung cancer. Research continues to be conducted to determine the relationship between XPD polymorphisms and lung cancer risk.
As a cornerstone of
Peronalized medicine cancer
Cancer is a group of diseases involving Cell growth#Disorders, abnormal cell growth with the potential to Invasion (cancer), invade or Metastasis, spread to other parts of the body. These contrast with benign tumors, which do not spread. Po ...
s,
Sequence analysis
In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. It can be performed on the entire genome ...
is becoming increasingly important to understand the specific mutations involved in the individual's cancer, such as needed to select specific molecular targets such as mutations in various receptors, but also understanding the polymorphisms they inherited which play important roles in diagnosis, prognosis, and treatment, such as treatment of
leukemia
Leukemia ( also spelled leukaemia; pronounced ) is a group of blood cancers that usually begin in the bone marrow and produce high numbers of abnormal blood cells. These blood cells are not fully developed and are called ''blasts'' or '' ...
with
6-mercaptopurine where toxicity largely depends on polymorphisms in multiple different genes involved in its metabolism.
Asthma
Asthma is an inflammatory disease of the lungs and more than 100 loci have been identified as contributing to the development and severity of the condition.
By using the traditional linkage analysis, these asthma correlated genes were able to be identified in small quantities using
genome-wide association studies (GWAS). There have been a number of studies looking into various polymorphisms of asthma-associated genes and how those polymorphisms interact with the carrier's environment. One example is the gene CD14, which is known to have a polymorphism that is associated with increased amounts of CD14 protein as well as reduced levels of IgE serum.
A study was conducted on 624 children looking at their IgE serum levels as it related to the polymorphism in CD14. The study found that IgE serum levels differed in children with the C allele in the CD14/-260 gene based on the type of allergens they regularly exposed to.
Children who were in regular contact with house pets showed higher serum levels of IgE while children who were regularly exposed to stable animals showed lower serum levels of IgE.
Continued research into gene-environment interactions may lead to more specialized treatment plans based on an individual's surroundings.
References
{{Reflist, 33em
Genes
Polymorphism (biology)