HOME

TheInfoList



OR:

Gene duplication (or chromosomal duplication or gene amplification) is a major mechanism through which new genetic material is generated during
molecular evolution Molecular evolution describes how Heredity, inherited DNA and/or RNA change over evolutionary time, and the consequences of this for proteins and other components of Cell (biology), cells and organisms. Molecular evolution is the basis of phylogen ...
. It can be defined as any duplication of a region of
DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
that contains a
gene In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
. Gene duplications can arise as products of several types of errors in
DNA replication In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all life, living organisms, acting as the most essential part of heredity, biolog ...
and
repair The technical meaning of maintenance involves functional checks, servicing, repairing or replacing of necessary devices, equipment, machinery, building infrastructure and supporting utilities in industrial, business, and residential installat ...
machinery as well as through fortuitous capture by selfish genetic elements. Common sources of gene duplications include ectopic recombination, retrotransposition event,
aneuploidy Aneuploidy is the presence of an abnormal number of chromosomes in a cell (biology), cell, for example a human somatic (biology), somatic cell having 45 or 47 chromosomes instead of the usual 46. It does not include a difference of one or more plo ...
,
polyploidy Polyploidy is a condition in which the cells of an organism have more than two paired sets of ( homologous) chromosomes. Most species whose cells have nuclei (eukaryotes) are diploid, meaning they have two complete sets of chromosomes, one fro ...
, and replication slippage.


Mechanisms of duplication


Ectopic recombination

Duplications arise from an event termed unequal crossing-over that occurs during meiosis between misaligned homologous chromosomes. The chance of it happening is a function of the degree of sharing of repetitive elements between two chromosomes. The products of this recombination are a duplication at the site of the exchange and a reciprocal deletion. Ectopic recombination is typically mediated by sequence similarity at the duplicate breakpoints, which form direct repeats. Repetitive genetic elements such as transposable elements offer one source of repetitive DNA that can facilitate recombination, and they are often found at duplication breakpoints in plants and mammals.


Replication slippage

Replication slippage is an error in DNA replication that can produce duplications of short genetic sequences. During replication
DNA polymerase A DNA polymerase is a member of a family of enzymes that catalyze the synthesis of DNA molecules from nucleoside triphosphates, the molecular precursors of DNA. These enzymes are essential for DNA replication and usually work in groups to create t ...
begins to copy the DNA. At some point during the replication process, the polymerase dissociates from the DNA and replication stalls. When the polymerase reattaches to the DNA strand, it aligns the replicating strand to an incorrect position and incidentally copies the same section more than once. Replication slippage is also often facilitated by repetitive sequences, but requires only a few bases of similarity.


Retrotransposition

Retrotransposon Retrotransposons (also called Class I transposable elements) are mobile elements which move in the host genome by converting their transcribed RNA into DNA through reverse transcription. Thus, they differ from Class II transposable elements, or ...
s, mainly L1, can occasionally act on cellular mRNA. Transcripts are reverse transcribed to DNA and inserted into random place in the genome, creating retrogenes. Resulting sequence usually lack introns and often contain poly(A) sequences that are also integrated into the genome. Many retrogenes display changes in gene regulation in comparison to their parental gene sequences, which sometimes results in novel functions. Retrogenes can move between different chromosomes to shape chromosomal evolution.


Aneuploidy

Aneuploidy Aneuploidy is the presence of an abnormal number of chromosomes in a cell (biology), cell, for example a human somatic (biology), somatic cell having 45 or 47 chromosomes instead of the usual 46. It does not include a difference of one or more plo ...
occurs when nondisjunction at a single chromosome results in an abnormal number of chromosomes. Aneuploidy is often harmful and in mammals regularly leads to spontaneous abortions (miscarriages). Some aneuploid individuals are viable, for example trisomy 21 in humans, which leads to Down syndrome. Aneuploidy often alters gene dosage in ways that are detrimental to the organism; therefore, it is unlikely to spread through populations.


Polyploidy

Polyploidy Polyploidy is a condition in which the cells of an organism have more than two paired sets of ( homologous) chromosomes. Most species whose cells have nuclei (eukaryotes) are diploid, meaning they have two complete sets of chromosomes, one fro ...
, or ''whole genome duplication'', is a product of
nondisjunction Nondisjunction is the failure of homologous chromosomes or sister chromatids to separate properly during cell division (mitosis/meiosis). There are three forms of nondisjunction: failure of a pair of homologous chromosomes to separate in meiosis I ...
during meiosis which results in additional copies of the entire genome. Polyploidy is common in plants, but it has also occurred in animals, with two rounds of whole genome duplication ( 2R event) in the vertebrate lineage leading to humans. It has also occurred in the hemiascomycete yeasts ~100 mya. After a whole genome duplication, there is a relatively short period of genome instability, extensive gene loss, elevated levels of nucleotide substitution and regulatory network rewiring. In addition, gene dosage effects play a significant role. Thus, most duplicates are lost within a short period, however, a considerable fraction of duplicates survive. Interestingly, genes involved in regulation are preferentially retained. Furthermore, retention of regulatory genes, most notably the
Hox gene Hox genes, a subset of homeobox, homeobox genes, are a gene cluster, group of related genes that Evolutionary developmental biology, specify regions of the body plan of an embryo along the craniocaudal axis, head-tail axis of animals. Hox protein ...
s, has led to adaptive innovation. Rapid evolution and functional divergence have been observed at the level of the transcription of duplicated genes, usually by point mutations in short transcription factor binding motifs. Furthermore, rapid evolution of protein phosphorylation motifs, usually embedded within rapidly evolving intrinsically disordered regions is another contributing factor for survival and rapid adaptation/neofunctionalization of duplicate genes. Thus, a link seems to exist between gene regulation (at least at the post-translational level) and genome evolution. Polyploidy is also a well known source of speciation, as offspring, which have different numbers of chromosomes compared to parent species, are often unable to interbreed with non-polyploid organisms. Whole genome duplications are thought to be less detrimental than aneuploidy as the relative dosage of individual genes should be the same.


As an evolutionary event


Rate of gene duplication

Comparisons of genomes demonstrate that gene duplications are common in most species investigated. This is indicated by variable copy numbers (
copy number variation Copy number variation (CNV) is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals. Copy number variation is a type of structural variation: specifically, it is a type of ...
) in the genome of humans or fruit flies. However, it has been difficult to measure the rate at which such duplications occur. Recent studies yielded a first direct estimate of the genome-wide rate of gene duplication in '' C. elegans'', the first multicellular eukaryote for which such as estimate became available. The gene duplication rate in ''C. elegans'' is on the order of 10−7 duplications/gene/generation, that is, in a population of 10 million worms, one will have a gene duplication per generation. This rate is two orders of magnitude greater than the spontaneous rate of point mutation per nucleotide site in this species. Older (indirect) studies reported locus-specific duplication rates in bacteria, ''Drosophila'', and humans ranging from 10−3 to 10−7/gene/generation.


Neofunctionalization

Gene duplications are an essential source of genetic novelty that can lead to evolutionary innovation. Duplication creates genetic redundancy, where the second copy of the gene is often free from selective pressure—that is,
mutation In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
s of it have no deleterious effects to its host organism. If one copy of a gene experiences a mutation that affects its original function, the second copy can serve as a 'spare part' and continue to function correctly. Thus, duplicate genes accumulate mutations faster than a functional single-copy gene, over generations of organisms, and it is possible for one of the two copies to develop a new and different function. Some examples of such neofunctionalization is the apparent mutation of a duplicated digestive gene in a family of ice fish into an antifreeze gene and duplication leading to a novel snake venom gene and the synthesis of 1 beta-hydroxytestosterone in pigs. Gene duplication is believed to play a major role in
evolution Evolution is the change in the heritable Phenotypic trait, characteristics of biological populations over successive generations. It occurs when evolutionary processes such as natural selection and genetic drift act on genetic variation, re ...
; this stance has been held by members of the scientific community for over 100 years.
Susumu Ohno was a Japanese-American geneticist and evolutionary biologist, and seminal researcher in the field of molecular evolution. Biography Susumu Ohno was born to Japanese parents in Keijō, Chōsen (present-day Seoul, South Korea), Empire of ...
was one of the most famous developers of this theory in his classic book ''Evolution by gene duplication'' (1970). Ohno argued that gene duplication is the most important evolutionary force since the emergence of the universal common ancestor. Major genome duplication events can be quite common. It is believed that the entire
yeast Yeasts are eukaryotic, single-celled microorganisms classified as members of the fungus kingdom (biology), kingdom. The first yeast originated hundreds of millions of years ago, and at least 1,500 species are currently recognized. They are est ...
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
underwent duplication about 100 million years ago.
Plant Plants are the eukaryotes that form the Kingdom (biology), kingdom Plantae; they are predominantly Photosynthesis, photosynthetic. This means that they obtain their energy from sunlight, using chloroplasts derived from endosymbiosis with c ...
s are the most prolific genome duplicators. For example,
wheat Wheat is a group of wild and crop domestication, domesticated Poaceae, grasses of the genus ''Triticum'' (). They are Agriculture, cultivated for their cereal grains, which are staple foods around the world. Well-known Taxonomy of wheat, whe ...
is hexaploid (a kind of
polyploid Polyploidy is a condition in which the biological cell, cells of an organism have more than two paired sets of (Homologous chromosome, homologous) chromosomes. Most species whose cells have Cell nucleus, nuclei (eukaryotes) are diploid, meaning ...
), meaning that it has six copies of its genome.


Subfunctionalization

Another possible fate for duplicate genes is that both copies are equally free to accumulate degenerative mutations, so long as any defects are complemented by the other copy. This leads to a neutral " subfunctionalization" (a process of constructive neutral evolution) or DDC (duplication-degeneration-complementation) model, in which the functionality of the original gene is distributed among the two copies. Neither gene can be lost, as both now perform important non-redundant functions, but ultimately neither is able to achieve novel functionality. Subfunctionalization can occur through neutral processes in which mutations accumulate with no detrimental or beneficial effects. However, in some cases subfunctionalization can occur with clear adaptive benefits. If an ancestral gene is
pleiotropic Pleiotropy () is a condition in which a single gene or genetic variant influences multiple phenotypic traits. A gene that has such multiple effects is referred to as a ''pleiotropic gene''. Mutations in pleiotropic genes can impact several trait ...
and performs two functions, often neither one of these two functions can be changed without affecting the other function. In this way, partitioning the ancestral functions into two separate genes can allow for adaptive specialization of subfunctions, thereby providing an adaptive benefit.


Loss

Often the resulting genomic variation leads to gene dosage dependent neurological disorders such as Rett-like syndrome and
Pelizaeus–Merzbacher disease Pelizaeus–Merzbacher disease is an X-linked neurological disorder that damages oligodendrocytes in the central nervous system. It is caused by mutations in proteolipid protein 1 (''PLP1''), a major myelin protein. It is characterized by a decr ...
. Such detrimental mutations are likely to be lost from the population and will not be preserved or develop novel functions. However, many duplications are, in fact, not detrimental or beneficial, and these neutral sequences may be lost or may spread through the population through random fluctuations via
genetic drift Genetic drift, also known as random genetic drift, allelic drift or the Wright effect, is the change in the Allele frequency, frequency of an existing gene variant (allele) in a population due to random chance. Genetic drift may cause gene va ...
.


Identifying duplications in sequenced genomes


Criteria and single genome scans

The two genes that exist after a gene duplication event are called paralogs and usually code for
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
s with a similar function and/or structure. By contrast,
orthologous Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speci ...
genes present in different species which are each originally derived from the same ancestral sequence. (See Homology of sequences in genetics). It is important (but often difficult) to differentiate between paralogs and orthologs in biological research. Experiments on human gene function can often be carried out on other
species A species () is often defined as the largest group of organisms in which any two individuals of the appropriate sexes or mating types can produce fertile offspring, typically by sexual reproduction. It is the basic unit of Taxonomy (biology), ...
if a homolog to a human gene can be found in the genome of that species, but only if the homolog is orthologous. If they are paralogs and resulted from a gene duplication event, their functions are likely to be too different. One or more copies of duplicated genes that constitute a gene family may be affected by insertion of transposable elements that causes significant variation between them in their sequence and finally may become responsible for
divergent evolution Divergent evolution or divergent selection is the accumulation of differences between closely related populations within a species, sometimes leading to speciation. Divergent evolution is typically exhibited when two populations become separate ...
. This may also render the chances and the rate of
gene conversion Gene conversion is the process by which one DNA sequence replaces a homologous sequence such that the sequences become identical after the conversion. Gene conversion can be either allelic, meaning that one allele of the same gene replaces another ...
between the homologs of gene duplicates due to less or no similarity in their sequences. Paralogs can be identified in single genomes through a sequence comparison of all annotated gene models to one another. Such a comparison can be performed on translated amino acid sequences (e.g. BLASTp, tBLASTx) to identify ancient duplications or on DNA nucleotide sequences (e.g. BLASTn, megablast) to identify more recent duplications. Most studies to identify gene duplications require reciprocal-best-hits or fuzzy reciprocal-best-hits, where each paralog must be the other's single best match in a sequence comparison. Most gene duplications exist as low copy repeats (LCRs), rather highly repetitive sequences like transposable elements. They are mostly found in pericentronomic, subtelomeric and interstitial regions of a chromosome. Many LCRs, due to their size (>1Kb), similarity, and orientation, are highly susceptible to duplications and deletions.


Genomic microarrays detect duplications

Technologies such as genomic
microarrays A microarray is a multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of biological interactions. It is a two-dimensional array on a solid substrate—usually a glass slide or silicon thin-film cell� ...
, also called array comparative
genomic Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, ...
hybridization (array CGH), are used to detect chromosomal abnormalities, such as microduplications, in a high throughput fashion from genomic DNA samples. In particular, DNA
microarray A microarray is a multiplex (assay), multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of biological interactions. It is a two-dimensional array on a Substrate (materials science), solid substrate—usu ...
technology can simultaneously monitor the expression levels of thousands of genes across many treatments or experimental conditions, greatly facilitating the evolutionary studies of
gene regulation Regulation of gene expression, or gene regulation, includes a wide range of mechanisms that are used by cells to increase or decrease the production of specific gene products (protein or RNA). Sophisticated programs of gene expression are wide ...
after gene duplication or
speciation Speciation is the evolutionary process by which populations evolve to become distinct species. The biologist Orator F. Cook coined the term in 1906 for cladogenesis, the splitting of lineages, as opposed to anagenesis, phyletic evolution within ...
.


Next generation sequencing

Gene duplications can also be identified through the use of next-generation sequencing platforms. The simplest means to identify duplications in genomic resequencing data is through the use of paired-end sequencing reads. Tandem duplications are indicated by sequencing read pairs which map in abnormal orientations. Through a combination of increased sequence coverage and abnormal mapping orientation, it is possible to identify duplications in genomic sequencing data.


Nomenclature

The International System for Human Cytogenomic Nomenclature (ISCN) is an international standard for human chromosome
nomenclature Nomenclature (, ) is a system of names or terms, or the rules for forming these terms in a particular field of arts or sciences. (The theoretical field studying nomenclature is sometimes referred to as ''onymology'' or ''taxonymy'' ). The principl ...
, which includes band names, symbols and abbreviated terms used in the description of human chromosome and chromosome abnormalities. Abbreviations include ''dup'' for duplications of parts of a chromosome. For example, dup(17p12) causes Charcot–Marie–Tooth disease type 1A.


As amplification

Gene duplication does not necessarily constitute a lasting change in a species' genome. In fact, such changes often don't last past the initial host organism. From the perspective of
molecular genetics Molecular genetics is a branch of biology that addresses how differences in the structures or expression of DNA molecules manifests as variation among organisms. Molecular genetics often applies an "investigative approach" to determine the st ...
, gene amplification is one of many ways in which a
gene In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
can be overexpressed. Genetic amplification can occur artificially, as with the use of the
polymerase chain reaction The polymerase chain reaction (PCR) is a method widely used to make millions to billions of copies of a specific DNA sample rapidly, allowing scientists to amplify a very small sample of DNA (or a part of it) sufficiently to enable detailed st ...
technique to amplify short strands of
DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
''
in vitro ''In vitro'' (meaning ''in glass'', or ''in the glass'') Research, studies are performed with Cell (biology), cells or biological molecules outside their normal biological context. Colloquially called "test-tube experiments", these studies in ...
'' using
enzymes An enzyme () is a protein that acts as a biological catalyst by accelerating chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as pro ...
, or it can occur naturally, as described above. If it's a natural duplication, it can still take place in a
somatic cell In cellular biology, a somatic cell (), or vegetal cell, is any biological cell forming the body of a multicellular organism other than a gamete, germ cell, gametocyte or undifferentiated stem cell. Somatic cells compose the body of an organism ...
, rather than a
germline In biology and genetics, the germline is the population of a multicellular organism's cells that develop into germ cells. In other words, they are the cells that form gametes ( eggs and sperm), which can come together to form a zygote. They dif ...
cell (which would be necessary for a lasting evolutionary change).


Role in cancer

Duplications of
oncogenes An oncogene is a gene that has the potential to cause cancer. In tumor cells, these genes are often mutated, or expressed at high levels.
are a common cause of many types of
cancer Cancer is a group of diseases involving Cell growth#Disorders, abnormal cell growth with the potential to Invasion (cancer), invade or Metastasis, spread to other parts of the body. These contrast with benign tumors, which do not spread. Po ...
. In such cases the genetic duplication occurs in a somatic cell and affects only the genome of the cancer cells themselves, not the entire organism, much less any subsequent offspring. Recent comprehensive patient-level classification and quantification of driver events in TCGA cohorts revealed that there are on average 12 driver events per tumor, of which 1.5 are amplifications of oncogenes. Whole-genome duplications are also frequent in cancers, detected in 30% to 36% of tumors from the most common cancer types. Their exact role in carcinogenesis is unclear, but they in some cases lead to loss of chromatin segregation leading to chromatin conformation changes that in turn lead to oncogenic epigenetic and transcriptional modifications.


See also


References


External links


''A bibliography on gene and genome duplication''


{{DEFAULTSORT:Gene Duplication Evolutionary biology concepts Genetics concepts Modification of genetic information Molecular evolution Mutation