Genomic structural variation is the variation in structure of an organism's
chromosome
A chromosome is a long DNA molecule with part or all of the genetic material of an organism. In most chromosomes the very long thin DNA fibers are coated with packaging proteins; in eukaryotic cells the most important of these proteins ar ...
. It consists of many kinds of variation in the
genome
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ...
of one
species
In biology, a species is the basic unit of Taxonomy (biology), classification and a taxonomic rank of an organism, as well as a unit of biodiversity. A species is often defined as the largest group of organisms in which any two individuals of ...
, and usually includes
microscopic
The microscopic scale () is the scale of objects and events smaller than those that can easily be seen by the naked eye, requiring a lens or microscope to see them clearly. In physics, the microscopic scale is sometimes regarded as the scale be ...
and
submicroscopic types, such as deletions, duplications,
copy-number variants, insertions,
inversions and
translocations. Originally, a structure variation affects a sequence length about 1kb to 3Mb, which is larger than
SNPs
In genetics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently larg ...
and smaller than
chromosome abnormality
A chromosomal abnormality, chromosomal anomaly, chromosomal aberration, chromosomal mutation, or chromosomal disorder, is a missing, extra, or irregular portion of chromosomal DNA. These can occur in the form of numerical abnormalities, where the ...
(though the definitions have some overlap). However, the operational range of structural variants has widened to include events > 50bp. The definition of structural variation does not imply anything about frequency or phenotypical effects. Many structural variants are associated with
genetic diseases
A genetic disorder is a health problem caused by one or more abnormalities in the genome. It can be caused by a mutation in a single gene (monogenic) or multiple genes (polygenic) or by a chromosomal abnormality. Although polygenic disorders ...
, however many are not.
Recent research about SVs indicates that SVs are more difficult to detect than SNPs. Approximately 13% of the human genome is defined as structurally variant in the normal population, and there are at least 240 genes that exist as
homozygous
Zygosity (the noun, zygote, is from the Greek "yoked," from "yoke") () is the degree to which both copies of a chromosome or gene have the same genetic sequence. In other words, it is the degree of similarity of the alleles in an organism.
Mo ...
deletion
polymorphism
Polymorphism, polymorphic, polymorph, polymorphous, or polymorphy may refer to:
Computing
* Polymorphism (computer science), the ability in programming to present the same programming interface for differing underlying forms
* Ad hoc polymorphis ...
s in human populations, suggesting these genes are dispensable in humans.
Rapidly accumulating evidence indicates that structural variations can comprise millions of nucleotides of heterogeneity within every genome, and are likely to make an important contribution to human diversity and disease susceptibility.
Microscopic structural variation
Microscopic means that it can be detected with
optical microscope
The optical microscope, also referred to as a light microscope, is a type of microscope that commonly uses visible light and a system of lenses to generate magnified images of small objects. Optical microscopes are the oldest design of micros ...
s, such as
aneuploidies,
marker chromosome
A marker chromosome (mar) is a small fragment of a chromosome which generally cannot be identified without specialized genomic analysis due to the size of the fragment.Thompson & Thompson Genetics in Medicine, Chapter 5, 57-74 https://www.clinicalk ...
, gross rearrangements and variation in chromosome size. The frequency in human population is thought to be underestimated due to the fact that some of these are not actually easy to identify. These structural abnormalities exist in 1 of every 375 live births by putative information.
Sub-microscopic structural variation
Sub-microscopic structural variants are much harder to detect owing to their small size. The first study in 2004 that used
DNA microarray
A DNA microarray (also commonly known as DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to ...
s could detect tens of genetic
loci
Locus (plural loci) is Latin for "place". It may refer to:
Entertainment
* Locus (comics), a Marvel Comics mutant villainess, a member of the Mutant Liberation Front
* ''Locus'' (magazine), science fiction and fantasy magazine
** '' Locus Award ...
that exhibited
copy number variation
Copy number variation (CNV) is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals. Copy number variation is a type of structural variation: specifically, it is a type of ...
,
deletions and
duplications, greater than 100
kilobase
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both ...
s in the human genome. However, by 2015 whole genome sequencing studies could detect around 5,000 of structural variants as small as 100
base pairs encompassing approximately 20
megabase
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both D ...
s in each individual genome.
These structural variants include deletions, tandem duplications,
inversions,
mobile element insertions. The mutation rate is also much higher than microscopic structural variants, estimated by two studies at 16% and 20% respectively, both of which are probably underestimates due to the challenges of accurately detecting structural variants.
It has also been shown that the generation of spontaneous structural variants significantly increases the likelihood of generating further spontaneous
single nucleotide variants or
indel
Indel is a molecular biology term for an insertion or deletion of bases in the genome of an organism. It is classified among small genetic variations, measuring from 1 to 10 000 base pairs in length, including insertion and deletion events that ...
s within 100
kilobase
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both ...
s of the structural variation event.
Copy-number variation
Copy-number variation (CNV) is a large category of structural variation, which includes
insertions,
deletions and
duplications. In recent studies, copy-number variations are tested on people who do not have genetic diseases, using methods that are used for quantitative SNP genotyping. Results show that 28% of the suspected regions in the individuals actually do contain copy number variations. Also, CNVs in human genome affect more nucleotides than
Single Nucleotide Polymorphism (SNP).
It is also noteworthy that many of CNVs are not in coding regions. Because CNVs are usually caused by
unequal recombination, widespread similar sequences such as
LINE
Line most often refers to:
* Line (geometry), object with zero thickness and curvature that stretches to infinity
* Telephone line, a single-user circuit on a telephone communication system
Line, lines, The Line, or LINE may also refer to:
Art ...
s and
SINE
In mathematics, sine and cosine are trigonometric functions of an angle. The sine and cosine of an acute angle are defined in the context of a right triangle: for the specified angle, its sine is the ratio of the length of the side that is opp ...
s may be a common mechanism of CNV creation.
Inversion
There are several inversions known which are related to human disease. For instance, recurrent 400kb inversion in factor VIII gene is a common cause of
haemophilia A
Haemophilia A (or hemophilia A) is a genetic deficiency in clotting factor VIII, which causes increased bleeding and usually affects males. In the majority of cases it is inherited as an X-linked recessive trait, though there are cases which arise ...
, and smaller inversions affecting
idunorate 2-sulphatase (IDS) will cause
Hunter syndrome
Hunter syndrome, or mucopolysaccharidosis type II (MPS II), is a rare genetic disorder in which large sugar molecules called glycosaminoglycans (or GAGs or mucopolysaccharides) build up in body tissues. It is a form of lysosomal storage disease. ...
. More examples include
Angelman syndrome and
Sotos syndrome
Sotos syndrome is a rare genetic disorder characterized by excessive physical growth during the first years of life. Excessive growth often starts in infancy and continues into the early teen years. The disorder may be accompanied by autism, mild ...
. However, recent research shows that one person can have 56 putative inversions, thus the non-disease inversions are more common than previously supposed. Also in this study it's indicated that inversion breakpoints are commonly associated with segmental duplications. One 900 kb inversion in the
chromosome 17
Chromosome 17 is one of the 23 pairs of chromosomes in humans. People normally have two copies of this chromosome. Chromosome 17 spans more than 83 million base pairs (the building material of DNA) and represents between 2.5 and 3% of the total ...
is under
positive selection
In population genetics, directional selection, is a mode of negative natural selection in which an extreme phenotype is favored over other phenotypes, causing the allele frequency to shift over time in the direction of that phenotype. Under dir ...
and are predicted to increase its frequency in European population.
Other structural variants
More complex structural variants can occur include a combination of the above in a single event.
The most common type of complex structural variation are non-tandem duplications, where sequence is duplicated and inserted in inverted or direct orientation into another part of the genome.
Other classes of complex structural variant include deletion-inversion-deletions, duplication-inversion-duplications, and tandem duplications with nested deletions.
There are also
cryptic translocations
Cryptic may refer to:
In science:
* Cryptic species complex, a group of species that are very difficult to distinguish from one another
* Crypsis
In ecology, crypsis is the ability of an animal or a plant to avoid observation or detection by ...
and
segmental uniparental disomy (UPD). There are increasing reports of these variations, but are more difficult to detect than traditional variations because these variants are balanced and array-based or
PCR PCR or pcr may refer to:
Science
* Phosphocreatine, a phosphorylated creatine molecule
* Principal component regression, a statistical technique
Medicine
* Polymerase chain reaction
** COVID-19 testing, often performed using the polymerase chain r ...
-based methods are not able to locate them.
Structural variation and phenotypes
Some
genetic diseases
A genetic disorder is a health problem caused by one or more abnormalities in the genome. It can be caused by a mutation in a single gene (monogenic) or multiple genes (polygenic) or by a chromosomal abnormality. Although polygenic disorders ...
are suspected to be caused by structural variations, but the relation is not very certain. It is not plausible to divide these variants into two classes as "normal" or "disease", because the actual output of the same variant will also vary. Also, a few of the variants are actually positively selected for (mentioned above).
A series of studies have shown that gene disrupting
spontaneous (''de novo'') CNVs disrupt genes approximately four times more frequently in autism than in controls and contribute to approximately 5–10% of cases.
Inherited variants also contribute to around 5–10% of cases of autism.
Structural variations also have its function in population genetics. Different frequency of a same variation can be used as a genetic mark to infer relationship between populations in different areas. A complete comparison between human and chimpanzee structural variation also suggested that some of these may be fixed in one species because of its adaptative function. There are also deletions related to resistance against
malaria
Malaria is a mosquito-borne infectious disease that affects humans and other animals. Malaria causes symptoms that typically include fever, tiredness, vomiting, and headaches. In severe cases, it can cause jaundice, seizures, coma, or deat ...
and
AIDS. Also, some highly variable segments are thought to be caused by balancing selection, but there are also studies against this hypothesis.
Database of structural variation
Some of genome browsers and
bioinformatic
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combin ...
databases have a list of structural variations in human genome with an emphasis on CNVs, and can show them in the genome browsing page, for example,
UCSC Genome Browser
The UCSC Genome Browser is an online and downloadable genome browser hosted by the University of California, Santa Cruz (UCSC). It is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate spe ...
. Under the page viewing a part of the genome, there are "Common Cell CNVs" and "Structural Var" which can be enabled.
On NCBI, there is a special page for structural variation. In that system, both "inner" and "outer" coordinates are shown; they are both not actual breakpoints, but surmised minimal and maximum range of sequence affected by the structural variation. The types are classified as insertion, loss, gain, inversion, LOH, everted, transchr and UPD.
Methods of detection

New methods have been developed to analyze human genetic structural variation at high resolutions. The methods used to test the genome are in either a specific targeted way or in a genome wide manner. For Genome wide tests, array-based comparative genome hybridization approaches bring the best genome wide scans to find new copy number variants.
These techniques use DNA fragments that are labeled from a genome of interest and are hybridized, with another genome labeled differently, to arrays spotted with cloned DNA fragments. This reveals copy number differences between two genomes.
For targeted genome examinations, the best assays for checking specific areas of the genome are primarily PCR based. The best established of the PCR based methods is real time quantitative polymerase chain reaction (qPCR).
A different approach is to specifically check certain areas that surround known segmental duplications since they are usually areas of copy number variation.
An SNP genotyping method that offers independent fluorescence intensities for two alleles can be used to target the nucleotides in between two copies of a segmental duplication.
From this, an increase in intensity from one of the alleles compared to the other can be observed.
With the development of
next-generation sequencing Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation s ...
(NGS) technology, four classes of strategies for the detection of structural variants with NGS data have been reported, with each being based on patterns that are diagnostic of different classes of SV.
* Read-depth or read-count methods assumes a random distribution (e.g.
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known ...
) of
reads from short read sequencing. The divergence from this distribution is investigated to discover duplications and deletions. Regions with duplication will show higher read depth while those with deletion will result in lower read depth.
* Split-read methods enable detection of insertions (including
mobile element
A transposable element (TE, transposon, or jumping gene) is a nucleic acid sequence in DNA that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Transp ...
insertions) and deletions down to single base-pair resolution. The presence of a SV is identified from discontinuous alignment to the reference genome. A gap in the read marks a deletion and in the reference marks an insertion.
* Read pair methods examine the length and orientation of paired-end reads from short read sequencing data. For example, read pairs further apart than expected indicate a deletion. Translocations, inversions and tandem duplications can likewise be discovered using read-pairs.
* ''De novo'' sequence assembly may be applied with reads that are accurate enough. While in practice use of this method is limited by the length of sequence reads, long read based genome assemblies offer structural variation discovery for classes such as insertions that escape detection when using other methods.
See also
*
Structural variation in the human genome
Structural variation in the human genome is operationally defined as genomic alterations, varying between individuals, that involve DNA segments larger than 1 kilo base (kb), and could be either microscopic or submicroscopic. This definition ...
References
{{Reflist
External links
The 1000 Genomes Project
Congenital disorders
Chromosomes