HOME

TheInfoList



OR:

An overlapping gene (or OLG) is a
gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...
whose expressible nucleotide sequence partially overlaps with the expressible nucleotide sequence of another gene. In this way, a nucleotide sequence may make a contribution to the function of one or more
gene product A gene product is the biochemical material, either RNA or protein, resulting from expression of a gene. A measurement of the amount of gene product is sometimes used to infer how active a gene is. Abnormal amounts of gene product can be correlate ...
s. Overlapping genes are present and a fundamental feature of both cellular and viral
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ...
s. The current definition of an overlapping gene varies significantly between eukaryotes, prokaryotes, and viruses. In
prokaryote A prokaryote () is a single-celled organism that lacks a nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Greek πρό (, 'before') and κάρυον (, 'nut' or 'kernel').Campbell, N. "Biology:Concepts & Con ...
s and
virus A virus is a submicroscopic infectious agent that replicates only inside the living cells of an organism. Viruses infect all life forms, from animals and plants to microorganisms, including bacteria and archaea. Since Dmitri Ivanovsk ...
es overlap must be between coding sequences but not
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the ...
transcripts, and is defined when these coding sequences share a nucleotide on either the same or opposite strands. In
eukaryote Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bacter ...
s, gene overlap is almost always defined as mRNA transcript overlap. Specifically, a gene overlap in eukaryotes is defined when at least one nucleotide is shared between the boundaries of the primary mRNA transcripts of two or more genes, such that a DNA base
mutation In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, m ...
at any point of the overlapping region would affect the transcripts of all genes involved. This definition includes 5′ and 3′
untranslated region In molecular genetics, an untranslated region (or UTR) refers to either of two sections, one on each side of a coding sequence on a strand of mRNA. If it is found on the 5' side, it is called the 5' UTR (or leader sequence), or if it is foun ...
s (UTRs) along with
intron An intron is any Nucleic acid sequence, nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word ''intron'' is derived from the term ''intragenic region'', i.e. a region inside a gene."The notion of ...
s. Overprinting refers to a type of overlap in which all or part of the sequence of one gene is read in an alternate
reading frame In molecular biology, a reading frame is a way of dividing the sequence of nucleotides in a nucleic acid ( DNA or RNA) molecule into a set of consecutive, non-overlapping triplets. Where these triplets equate to amino acids or stop signals during ...
from another gene at the same
locus Locus (plural loci) is Latin for "place". It may refer to: Entertainment * Locus (comics), a Marvel Comics mutant villainess, a member of the Mutant Liberation Front * ''Locus'' (magazine), science fiction and fantasy magazine ** ''Locus Award' ...
. The alternative open reading frames (ORF) are thought to be created by critical nucleotide substitutions within an expressible pre-existing gene, which can be induced to express a novel
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...
while still preserving the function of the original gene. Overprinting has been hypothesized as a mechanism for ''de novo'' emergence of new genes from existing sequences, either older genes or previously
non-coding Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and regul ...
regions of the genome. It is believed that most overlapping genes, or genes whose expressible nucleotide sequences partially overlap with each other, evolved in part due to this mechanism, suggesting that each overlap is composed of one ancestral gene and one novel gene. Subsequently, overprinting is also believed to be a source of novel proteins, as de novo proteins coded by these novel genes usually lack remote
homologs A couple of homologous chromosomes, or homologs, are a set of one maternal and one paternal chromosome that pair up with each other inside a cell during fertilization. Homologs have the same genes in the same loci where they provide points alon ...
in databases. Overprinted genes are particularly common features of the
genomic Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
organization of viruses, likely to greatly increase the number of potential expressible genes from a small set of viral genetic information. It is likely that overprinting is responsible for the generation of numerous novel proteins by viruses over the course of their
evolutionary history The history of life on Earth traces the processes by which living and fossil organisms evolved, from the earliest emergence of life to present day. Earth formed about 4.5 billion years ago (abbreviated as ''Ga'', for ''gigaannum'') and ev ...
.


Classification

Genes may overlap in a variety of ways and can be classified by their positions relative to each other. * ''Unidirectional'' or ''tandem'' overlap: the 3' end of one gene overlaps with the 5' end of another gene on the same strand. This arrangement can be symbolized with the notation → → where arrows indicate the reading frame from start to end. * ''Convergent'' or ''end-on'' overlap: the 3' ends of the two genes overlap on opposite strands. This can be written as → ←. * ''Divergent'' or ''tail-on'' overlap: the 5' ends of the two genes overlap on opposite strands. This can be written as ← →. Overlapping genes can also be classified by ''phases'', which describe their relative
reading frame In molecular biology, a reading frame is a way of dividing the sequence of nucleotides in a nucleic acid ( DNA or RNA) molecule into a set of consecutive, non-overlapping triplets. Where these triplets equate to amino acids or stop signals during ...
s: *''In-phase overlap'' occurs when the shared sequences use the same reading frame. This is also known as "phase 0". Unidirectional genes with phase 0 overlap are not considered distinct genes, but rather as alternative start sites of the same gene. *''Out-of-phase overlaps'' occurs when the shared sequences use different reading frames. This can occur in "phase 1" or "phase 2", depending on whether the reading frames are offset by 1 or 2 nucleotides. Because a
codon The genetic code is the set of rules used by living cells to translate information encoded within genetic material ( DNA or RNA sequences of nucleotide triplets, or codons) into proteins. Translation is accomplished by the ribosome, which links ...
is three nucleotides long, an offset of three nucleotides is an in-phase, phase 0 frame. Studies on overlapping genes suggest that their evolution can be summarized in two possible models. In one model, the two proteins encoded by their respective overlapping genes evolve under similar
selection pressures Any cause that reduces or increases reproductive success in a portion of a population potentially exerts evolutionary pressure, selective pressure or selection pressure, driving natural selection. It is a quantitative description of the amount of ...
. The proteins and the overlap region are highly conserved when strong selection against
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha ...
change is favored. Overlapping genes are reasoned to evolve under strict constraints as a single nucleotide substitution is able to alter the structure and function of the two proteins simultaneously. A study on the
hepatitis B virus ''Hepatitis B virus'' (HBV) is a partially double-stranded DNA virus, a species of the genus '' Orthohepadnavirus'' and a member of the '' Hepadnaviridae'' family of viruses. This virus causes the disease hepatitis B. Disease Despite there b ...
(HBV), whose DNA genome contains numerous overlapping genes, showed the mean number of synonymous nucleotide substitutions per site in overlapping coding regions was significantly lower than that of non-overlapping regions. The same study showed that it was possible for some of these overlapping regions and their proteins to diverge significantly from the original when there's weak selection against amino acid change. The spacer domain of the
polymerase A polymerase is an enzyme ( EC 2.7.7.6/7/19/48/49) that synthesizes long chains of polymers or nucleic acids. DNA polymerase and RNA polymerase are used to assemble DNA and RNA molecules, respectively, by copying a DNA template strand using ba ...
and the pre-S1 region of a surface protein of HBV, for example, had a percentage of conserved amino acids of 30% and 40%, respectively. However, these overlap regions are known to be less important for replication compared to the overlap regions that were highly conserved among different HBV strains, which are absolutely essential for the process. The second model suggests that the two proteins and their respective overlap genes evolve under opposite selection pressures: one frame experiences positive selection while the other is under
purifying selection In natural selection, negative selection or purifying selection is the selective removal of alleles that are deleterious. This can result in stabilising selection through the purging of deleterious genetic polymorphisms that arise through random ...
. In
tombusvirus ''Tombusvirus'' is a genus of viruses, in the family '' Tombusviridae''. Plants serve as natural hosts. There are 17 species in this genus. Symptoms associated with this genus include mosaic. The name of the genus comes from ''Tomato bushy stunt ...
es, the proteins p19 and p22 are encoded by overlapping genes that form a 549 nt coding region, and p19 is shown to be under positive selection while p22 is under purifying selection. Additional examples are mentioned in studies involving overlapping genes of the
Sendai virus ''Murine respirovirus'', formerly ''Sendai virus'' (SeV) and previously also known as murine parainfluenza virus type 1 or hemagglutinating virus of Japan (HVJ), is an enveloped,150-200 nm in diameter, a negative sense, single-stranded RN ...
,
potato leafroll virus Potato leafroll virus (PLRV) is a member of the genus ''Polerovirus'' and family ''Solemoviridae''. The phloem limited positive sense RNA virus infects potatoes and other members of the family Solanaceae. PLRV was first described by Quanjer ...
, and human
parvovirus B19 Primate erythroparvovirus 1, generally referred to as B19 virus (B19V), parvovirus B19 or sometimes erythrovirus B19, is the first (and until 2005 the only) known human virus in the family '' Parvoviridae'', genus ''Erythroparvovirus''; it measu ...
. This phenomenon of overlapping genes experiencing different selection pressures is suggested to be a consequence of a high rate of nucleotide substitution with different effects on the two frames; the substitutions may be majorly non-synonymous for one frame while mostly being
synonymous A synonym is a word, morpheme, or phrase that means exactly or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are a ...
for the other frame.


Evolution

Overlapping genes are particularly common in rapidly evolving genomes, such as those of
virus A virus is a submicroscopic infectious agent that replicates only inside the living cells of an organism. Viruses infect all life forms, from animals and plants to microorganisms, including bacteria and archaea. Since Dmitri Ivanovsk ...
es,
bacteria Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were am ...
, and
mitochondria A mitochondrion (; ) is an organelle found in the cells of most Eukaryotes, such as animals, plants and fungi. Mitochondria have a double membrane structure and use aerobic respiration to generate adenosine triphosphate (ATP), which is used ...
. They may originate in three ways: # By extension of an existing
open reading frame In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible readi ...
(ORF) downstream into a contiguous gene due to the loss of a
stop codon In molecular biology (specifically protein biosynthesis), a stop codon (or termination codon) is a codon (nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in mess ...
; # By extension of an existing ORF upstream into a contiguous gene due to loss of an
initiation codon The start codon is the first codon of a messenger RNA (mRNA) transcript translated by a ribosome. The start codon always codes for methionine in eukaryotes and Archaea and a N-formylmethionine (fMet) in bacteria, mitochondria and plastids. The mo ...
; # By generation of a novel ORF within an existing one due to a
point mutation A point mutation is a genetic mutation where a single nucleotide base is changed, inserted or deleted from a DNA or RNA sequence of an organism's genome. Point mutations have a variety of effects on the downstream protein product—consequence ...
. The use of the same nucleotide sequence to encode multiple genes may provide
evolution Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation ...
ary advantage due to reduction in
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ...
size and due to the opportunity for
transcriptional Transcription is the process of copying a segment of DNA into RNA. The segments of DNA transcribed into RNA molecules that can encode proteins are said to produce messenger RNA (mRNA). Other segments of DNA are copied into RNA molecules calle ...
and
translational Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...
co-regulation Co-regulation (or coregulation) is a term used in psychology. It is defined most broadly as a "continuous unfolding of individual action that is susceptible to being continuously modified by the continuously changing actions of the partner". An imp ...
of the overlapping genes. Gene overlaps introduce novel evolutionary constraints on the sequences of the overlap regions.


Origins of new genes

In 1977, Pierre-Paul Grassé proposed that one of the genes in the pair could have originated ''de novo'' by mutations to introduce novel ORFs in alternate reading frames; he described the mechanism as ''overprinting''. It was later substantiated by
Susumu Ohno Susumu is a masculine Japanese given name. Notable people with the name include: * Susumu Akagi (born 1972) Japanese voice actor * Susumu Aoyagi (青柳 進, born 1968), Japanese baseball player *Susumu Chiba (born 1970), Japanese voice actor *, J ...
, who identified a candidate gene that may have arisen by this mechanism. Some de novo genes originating in this way may not remain overlapping, but subfunctionalize following
gene duplication Gene duplication (or chromosomal duplication or gene amplification) is a major mechanism through which new genetic material is generated during molecular evolution. It can be defined as any duplication of a region of DNA that contains a gene. ...
, contributing to the prevalence of
orphan gene Orphan genes, ORFans, or taxonomically restricted genes (TRGs) are genes that lack a detectable homologue outside of a given species or lineage. Most genes have known homologues. Two genes are homologous when they share an evolutionary history, a ...
s. Which member of an overlapping gene pair is younger can be identified
bioinformatic Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combine ...
ally either by a more restricted
phylogenetic In biology, phylogenetics (; from Greek φυλή/ φῦλον [] "tribe, clan, race", and wikt:γενετικός, γενετικός [] "origin, source, birth") is the study of the evolutionary history and relationships among or within groups ...
distribution, or by less optimized codon usage. Younger members of the pair tend to have higher intrinsic structural disorder than older members, but the older members are also more disordered than other proteins, presumably as a way of alleviating the increased evolutionary constraints posed by overlap. Overlaps are more likely to originate in proteins that already have high disorder.


Taxonomic distribution

Overlapping genes occur in all domains of life, though with varying frequencies. They are especially common in viral genomes.


Viruses

The existence of overlapping genes was first identified in the virus ΦX174, whose
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ...
was the first DNA genome ever sequenced by
Frederick Sanger Frederick Sanger (; 13 August 1918 – 19 November 2013) was an English biochemist who received the Nobel Prize in Chemistry twice. He won the 1958 Chemistry Prize for determining the amino acid sequence of insulin and numerous other p ...
in 1977. Previous analysis of ΦX174, a small single-stranded DNA
bacteriophage A bacteriophage (), also known informally as a ''phage'' (), is a duplodnaviria virus that infects and replicates within bacteria and archaea. The term was derived from "bacteria" and the Greek φαγεῖν ('), meaning "to devour". Bac ...
that infected the bacteria
Escherichia coli ''Escherichia coli'' (),Wells, J. C. (2000) Longman Pronunciation Dictionary. Harlow ngland Pearson Education Ltd. also known as ''E. coli'' (), is a Gram-negative, facultative anaerobic, rod-shaped, coliform bacterium of the genus '' Esc ...
, suggested that the
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...
s produced during infection required coding sequences longer than the measured length of its genome. Analysis of the fully sequenced 5386 nucleotide genome showed that the virus possessed extensive overlap between coding regions, revealing that some genes (like genes D and E) were translated from the same DNA sequences but in different reading frames. An alternative start site within the genome replication gene A of ΦX174 was shown to express a truncated protein with an identical coding sequence to the
C-terminus The C-terminus (also known as the carboxyl-terminus, carboxy-terminus, C-terminal tail, C-terminal end, or COOH-terminus) is the end of an amino acid chain (protein or polypeptide), terminated by a free carboxyl group (-COOH). When the protein i ...
of the original A protein but possessing a different function It was concluded that other undiscovered sites of polypeptide synthesis could be hidden through the genome due to overlapping genes. An identified de novo gene of another overlapping gene locus was shown to express a novel protein that induces lysis of E. coli by inhibiting biosynthesis of its cell wall 6 suggesting that de novo protein creation through the process of overprinting can be a significant factor in the evolution of
pathogenicity In biology, a pathogen ( el, πάθος, "suffering", "passion" and , "producer of") in the oldest and broadest sense, is any organism or agent that can produce disease. A pathogen may also be referred to as an infectious agent, or simply a ge ...
of viruses. Another example is the ''
ORF3d ORF3d is a gene found in SARS-CoV-2 (the virus that causes COVID-19) and at least one closely related coronavirus found in pangolins, though it is not found in other closely related viruses within the ''Sarbecovirus'' subgenus. It is 57 codons ...
'' gene in the SARS-CoV 2 virus. Overlapping genes are particularly common in viral genomes. Some studies attribute this observation to
selective pressure Any cause that reduces or increases reproductive success in a portion of a population potentially exerts evolutionary pressure, selective pressure or selection pressure, driving natural selection. It is a quantitative description of the amount of ...
toward small genome sizes mediated by the physical constraints of packaging the genome in a
viral capsid A capsid is the protein shell of a virus, enclosing its genetic material. It consists of several oligomeric (repeating) structural subunits made of protein called protomers. The observable 3-dimensional morphological subunits, which may or ma ...
, particularly one of icosahedral geometry. However, other studies dispute this conclusion and argue that the distribution of overlaps in viral genomes is more likely to reflect overprinting as the evolutionary origin of overlapping viral genes. Overprinting is a common source of ''de novo'' genes in viruses. The proportion of viruses with overlapping coding sequences within their genomes varies. Double-stranded
RNA virus An RNA virus is a virusother than a retrovirusthat has ribonucleic acid ( RNA) as its genetic material. The nucleic acid is usually single-stranded RNA (ssRNA) but it may be double-stranded (dsRNA). Notable human diseases caused by RNA virus ...
es have fewer than a quarter that contains them while almost three-quarters of
retroviridae A retrovirus is a type of virus that inserts a DNA copy of its RNA genome into the DNA of a host cell that it invades, thus changing the genome of that cell. Once inside the host cell's cytoplasm, the virus uses its own reverse transcriptase ...
and viruses with single-stranded DNA genomes contain overlapping coding sequences. Segmented viruses in particular, or viruses with their genome split into separate pieces and packaged either all in the same
capsid A capsid is the protein shell of a virus, enclosing its genetic material. It consists of several oligomeric (repeating) structural subunits made of protein called protomers. The observable 3-dimensional morphological subunits, which may or ma ...
or in separate capsids, are more likely to contain an overlapping sequence than non-segmented viruses. RNA viruses have fewer overlapping genes than DNA viruses which possess lower
mutation rate In genetics, the mutation rate is the frequency of new mutations in a single gene or organism over time. Mutation rates are not constant and are not limited to a single type of mutation; there are many different types of mutations. Mutation rates ...
s and less restrictive genome sizes. The lower mutation rate of DNA viruses facilitates greater genomic novelty and evolutionary exploration within a structurally constrained genome and may be the primary driver of the evolution of overlapping genes. Studies of overprinted viral genes suggest that their protein products tend to be accessory proteins which are not essential to viral proliferation, but contribute to
pathogenicity In biology, a pathogen ( el, πάθος, "suffering", "passion" and , "producer of") in the oldest and broadest sense, is any organism or agent that can produce disease. A pathogen may also be referred to as an infectious agent, or simply a ge ...
. Overprinted proteins often have unusual
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha ...
distributions and high levels of intrinsic
disorder Disorder may refer to randomness, non-order, or no intelligible pattern. Disorder may also refer to: Healthcare * Disorder (medicine), a functional abnormality or disturbance * Mental disorder or psychological disorder, a psychological pattern ...
. In some cases overprinted proteins do have well-defined, but novel, three-dimensional structures; one example is the
RNA silencing suppressor p19 RNA silencing suppressor p19 (also known as Tombusvirus P19 core protein and 19 kDa symptom severity modulator) is a protein expressed from the ORF4 gene in the genome of tombusviruses. These viruses are positive-sense single-stranded RNA viruses ...
found in
Tombusvirus ''Tombusvirus'' is a genus of viruses, in the family '' Tombusviridae''. Plants serve as natural hosts. There are 17 species in this genus. Symptoms associated with this genus include mosaic. The name of the genus comes from ''Tomato bushy stunt ...
es, which has both a novel
protein fold A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred (see homology). Usually this common ancestry is inferred from structural alignment and mechanistic similarity, even if no sequence similari ...
and a novel binding mode in recognizing
siRNA Small interfering RNA (siRNA), sometimes known as short interfering RNA or silencing RNA, is a class of double-stranded RNA at first non-coding RNA molecules, typically 20-24 (normally 21) base pairs in length, similar to miRNA, and operating ...
s.


Prokaryotes

Estimates of gene overlap in
bacteria Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were am ...
l genomes typically find that around one third of bacterial genes are overlapped, though usually only by a few base pairs. Most studies of overlap in bacterial genomes find evidence that overlap serves a function in
gene regulation Regulation of gene expression, or gene regulation, includes a wide range of mechanisms that are used by cells to increase or decrease the production of specific gene products ( protein or RNA). Sophisticated programs of gene expression are w ...
, permitting the overlapped genes to be transcriptionally and translationally co-regulated. In prokaryotic genomes, unidirectional overlaps are most common, possibly due to the tendency of adjacent prokaryotic genes to share orientation. Among unidirectional overlaps, long overlaps are more commonly read with a one-nucleotide offset in reading frame (i.e., phase 1) and short overlaps are more commonly read in phase 2. Long overlaps of greater than 60
base pair A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both D ...
s are more common for convergent genes; however, putative long overlaps have very high rates of misannotation. Robustly validated examples of long overlaps in bacterial genomes are rare; in the well-studied
model organism A model organism (often shortened to model) is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workin ...
''
Escherichia coli ''Escherichia coli'' (),Wells, J. C. (2000) Longman Pronunciation Dictionary. Harlow ngland Pearson Education Ltd. also known as ''E. coli'' (), is a Gram-negative, facultative anaerobic, rod-shaped, coliform bacterium of the genus '' Esc ...
'', only four gene pairs are well validated as having long, overprinted overlaps.


Eukaryotes

Compared to prokaryotic genomes, eukaryotic genomes are often poorly annotated and thus identifying genuine overlaps is relatively challenging. However, examples of validated gene overlaps have been documented in a variety of eukaryotic organisms, including mammals such as mice and humans. Eukaryotes differ from prokaryotes in distribution of overlap types: while unidirectional (i.e., same-strand) overlaps are most common in prokaryotes, opposite or antiparallel-strand overlaps are more common in eukaryotes. Among the opposite-strand overlaps, convergent orientation is most common. Most studies of eukaryotic gene overlap have found that overlapping genes are extensively subject to genomic reorganization even in closely related species, and thus the presence of an overlap is not always well-conserved. Overlap with older or less taxonomically restricted genes is also a common feature of genes likely to have originated ''de novo'' in a given eukaryotic lineage.


Function

The precise functions of overlapping genes seems to vary across the domains of life but several experiments have shown that they are important for virus lifecycles through proper protein expression and stoichiometry as well as playing a role in proper protein folding. A version of
bacteriophage A bacteriophage (), also known informally as a ''phage'' (), is a duplodnaviria virus that infects and replicates within bacteria and archaea. The term was derived from "bacteria" and the Greek φαγεῖν ('), meaning "to devour". Bac ...
ΦX174 has also been created where all gene overlaps were removed proving they were not necessary for replication. The retention and evolution of overlapping genes within viruses may also be due to
capsid A capsid is the protein shell of a virus, enclosing its genetic material. It consists of several oligomeric (repeating) structural subunits made of protein called protomers. The observable 3-dimensional morphological subunits, which may or ma ...
size limitations. Dramatic viability loss was observed in viruses with genomes engineered to be longer than the wild-type genome. Increasing the single-stranded DNA genome length of ΦX174 by >1% results in almost complete loss of
infectivity In epidemiology, infectivity is the ability of a pathogen to establish an infection. More specifically, infectivity is a pathogen's capacity for horizontal transmission — that is, how frequently it spreads among hosts that are not in a parent ...
, believed to be the result of the strict physical constraints imposed by the finite capsid volume. Studies on
adeno-associated virus Adeno-associated viruses (AAV) are small viruses that infect humans and some other primate species. They belong to the genus ''Dependoparvovirus'', which in turn belongs to the family '' Parvoviridae''. They are small (approximately 26 nm i ...
es as
gene delivery Gene delivery is the process of introducing foreign genetic material, such as DNA or RNA, into host cells. Gene delivery must reach the genome of the host cell to induce gene expression. Successful gene delivery requires the foreign gene delive ...
vectors showed that viral packaging is constrained by genetic cargo size limits, requiring the use of multiple vectors to deliver large human genes such as CFTR81. Therefore, it is suggested that overlapping genes evolved as a means to overcome these physical constraints, increasing genetic diversity by utilizing only the existing sequence rather than increasing genome length.


Methods in identifying overlapping genes and ORFs

Standardized methods such as genome annotation may be inappropriate for the detection of overlapping genes as they are reliant on already curated genes while overlapping genes are generally overlooked contain atypical sequence composition. Genome annotation standards are also often biased against feature overlaps, such as genes entirely contained within another gene. Furthermore, some bioinformatics pipelines such as the RAST pipeline markedly penalizes overlaps between predicted ORFs. However, rapid advancement of genome-scale protein and RNA measurement tools along with increasingly advanced prediction algorithms have revealed an avalanche of overlapping genes and ORFs within numerous genomes. Proteogenomic methods have been essential in discovering numerous overlapping genes and include a combination of techniques such as
bottom-up proteomics Bottom-up proteomics is a common method to identify proteins and characterize their amino acid sequences and post-translational modifications by proteolytic digestion of proteins prior to analysis by mass spectrometry. The major alternative work ...
,
ribosome profiling Ribosome profiling, or Ribo-Seq (also named ribosome footprinting), is an adaptation of a technique developed by Joan Steitz and Marilyn Kozak almost 50 years ago that Nicholas Ingolia and Jonathan Weissman adapted to work with next generation ...
,
DNA sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. T ...
, and
perturbation Perturbation or perturb may refer to: * Perturbation theory, mathematical methods that give approximate solutions to problems that cannot be solved exactly * Perturbation (geology), changes in the nature of alluvial deposits over time * Perturbat ...
.
RNA sequencing RNA-Seq (named as an abbreviation of RNA sequencing) is a sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing c ...
is also used to identify genomic regions containing overlapping transcripts. It has been utilized to identify 180,000 alternate ORFs within previously annotated coding regions found in humans. Newly discovered ORFs such as these are verified using a variety of
reverse genetics Reverse genetics is a method in molecular genetics that is used to help understand the function(s) of a gene by analysing the phenotypic effects caused by genetically engineering specific nucleic acid sequences within the gene. The process pr ...
techniques, such as
CRISPR-Cas9 Cas9 (CRISPR associated protein 9, formerly called Cas5, Csn1, or Csx12) is a 160 kilodalton protein which plays a vital role in the immunological defense of certain bacteria against DNA viruses and plasmids, and is heavily utilized in genetic ...
and catalytically dead Cas9 (dCas9) disruption. Attempts at proof-by-synthesis are also performed to show beyond doubt the absence of any undiscovered overlapping genes.


References

{{Use dmy dates, date=April 2017 Genes