
Pseudogenes are nonfunctional segments of
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
that resemble functional
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s. Pseudogenes can be formed from both protein-coding genes and non-coding genes. In the case of protein-coding genes, most pseudogenes arise as superfluous copies of functional genes, either directly by
gene duplication
Gene duplication (or chromosomal duplication or gene amplification) is a major mechanism through which new genetic material is generated during molecular evolution. It can be defined as any duplication of a region of DNA that contains a gene ...
or indirectly by
reverse transcription
A reverse transcriptase (RT) is an enzyme used to convert RNA genome to DNA, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B virus, hepatitis B to replicate their genomes, by retrot ...
of an
mRNA
In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein.
mRNA is ...
transcript. Pseudogenes are usually identified when genome sequence analysis finds gene-like sequences that lack regulatory sequences or are incapable of producing a functional product. Pseudogenes are a type of
junk DNA
Junk DNA (non-functional DNA) is a DNA sequence that has no known biological function. Most organisms have some junk DNA in their genomes—mostly pseudogenes and fragments of transposons and viruses—but it is possible that some organ ...
.
Most non-bacterial genomes contain many pseudogenes, often as many as functional genes. This is not surprising, since various biological processes are expected to accidentally create pseudogenes, and there are no specialized mechanisms to remove them from genomes. Eventually pseudogenes may be deleted from their genomes by chance of
DNA replication
In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all life, living organisms, acting as the most essential part of heredity, biolog ...
or
DNA repair
DNA repair is a collection of processes by which a cell (biology), cell identifies and corrects damage to the DNA molecules that encode its genome. A weakened capacity for DNA repair is a risk factor for the development of cancer. DNA is cons ...
errors, or they may accumulate so many
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
al changes that they are no longer recognizable as former genes. Analysis of these degeneration events helps clarify the effects of non-selective processes in genomes.
Pseudogene sequences may be transcribed into
RNA
Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
at low levels, due to
promoter elements inherited from the ancestral gene or arising by new mutations. Although most of these transcripts will have no more functional significance than chance transcripts from other parts of the genome, some have given rise to beneficial regulatory RNAs and new proteins.
Properties
Pseudogenes are usually characterized by a combination of similarity or
homology to a known gene, together with a loss of some functionality. That is, although every pseudogene has a
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
sequence that is similar to some functional gene, they are usually unable to produce functional final protein products.
Pseudogenes are sometimes difficult to identify and characterize in genomes, because the two requirements of similarity and loss of functionality are usually implied through sequence alignments rather than biologically proven.
#Homology is implied by sequence similarity between the DNA sequences of the pseudogene and a known gene. After
aligning the two sequences, the percentage of identical
base pair
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
s is computed. A high sequence identity means that it is highly likely that these two sequences diverged from a common ancestral sequence (are homologous), and highly unlikely that these two sequences have evolved independently (see
Convergent evolution
Convergent evolution is the independent evolution of similar features in species of different periods or epochs in time. Convergent evolution creates analogous structures that have similar form or function but were not present in the last comm ...
).
#Nonfunctionality can manifest itself in many ways. Normally, a gene must go through several steps to a fully functional protein:
Transcription
Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including:
Genetics
* Transcription (biology), the copying of DNA into RNA, often th ...
,
pre-mRNA processing,
translation
Translation is the communication of the semantics, meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The English la ...
, and
protein folding
Protein folding is the physical process by which a protein, after Protein biosynthesis, synthesis by a ribosome as a linear chain of Amino acid, amino acids, changes from an unstable random coil into a more ordered protein tertiary structure, t ...
are all required parts of this process. If any of these steps fails, then the sequence may be considered nonfunctional. In high-throughput pseudogene identification, the most commonly identified disablements are premature
stop codon
In molecular biology, a stop codon (or termination codon) is a codon (nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in messenger RNA correspond to the additio ...
s and
frameshifts, which almost universally prevent the translation of a functional protein product.
Pseudogenes for
RNA
Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
genes are usually more difficult to discover as they do not need to be translated and thus do not have "reading frames". A number of rRNA pseudogenes have been identified on the basis of changes in rDNA array ends.
Pseudogenes can complicate molecular genetic studies. For example, amplification of a gene by
PCR may simultaneously amplify a pseudogene that shares similar sequences. This is known as PCR bias or amplification bias. Similarly, pseudogenes are sometimes
annotated
An annotation is extra information associated with a particular point in a document or other piece of information. It can be a note that includes a comment or explanation. Annotations are sometimes presented in the margin of book pages. For anno ...
as genes in
genome
A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
sequences.
Processed pseudogenes often pose a problem for
gene prediction
In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functio ...
programs, often being misidentified as real genes or exons. It has been proposed that the identification of processed pseudogenes can help improve the accuracy of gene prediction methods.
In 2014, 140 human pseudogenes have been shown to be translated. However, the function, if any, of the protein products is unknown.
Types and origin

There are four main types of pseudogenes, all with distinct mechanisms of origin and characteristic features. The classifications of pseudogenes are as follows:
Processed

In higher
eukaryote
The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
s, particularly
mammal
A mammal () is a vertebrate animal of the Class (biology), class Mammalia (). Mammals are characterised by the presence of milk-producing mammary glands for feeding their young, a broad neocortex region of the brain, fur or hair, and three ...
s,
retrotransposition
A transposable element (TE), also transposon, or jumping gene, is a type of mobile genetic element, a nucleic acid sequence in DNA that can change its position within a genome.
The discovery of mobile genetic elements earned Barbara McClinto ...
is a fairly common event that has had a huge impact on the composition of the genome. For example, somewhere between 30 and 44% of the
human genome
The human genome is a complete set of nucleic acid sequences for humans, encoded as the DNA within each of the 23 distinct chromosomes in the cell nucleus. A small DNA molecule is found within individual Mitochondrial DNA, mitochondria. These ar ...
consists of repetitive elements such as
SINEs
Sines () is a town and a municipality in Portugal. The municipality, divided into two parishes, has around 14,214 inhabitants (2021) in an area of . Sines holds an important oil refinery and several petrochemical industries. It is also a popular ...
and
LINEs (see
retrotransposons
Retrotransposons (also called Class I transposable elements) are transposable element, mobile elements which move in the host genome by converting their transcribed RNA into DNA through reverse transcription. Thus, they differ from Class II trans ...
).
In the process of retrotransposition, a portion of the
mRNA
In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein.
mRNA is ...
or
hnRNA transcript of a gene is spontaneously
reverse transcribed
A reverse transcriptase (RT) is an enzyme used to convert RNA genome to DNA, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B to replicate their genomes, by retrotransposon mobile g ...
back into DNA and inserted into chromosomal DNA. Although retrotransposons usually create copies of themselves, it has been shown in an ''in vitro'' system that they can create retrotransposed copies of random genes, too.
Once these pseudogenes are inserted back into the genome, they usually contain a
poly-A tail
Polyadenylation is the addition of a poly(A) tail to an RNA transcript, typically a messenger RNA (mRNA). The poly(A) tail consists of multiple adenosine monophosphates; in other words, it is a stretch of RNA that has only adenine bases. In euka ...
, and usually have had their introns
spliced out; these are both hallmark features of
cDNA
In genetics, complementary DNA (cDNA) is DNA that was reverse transcribed (via reverse transcriptase) from an RNA (e.g., messenger RNA or microRNA). cDNA exists in both single-stranded and double-stranded forms and in both natural and engin ...
s. However, because they are derived from an RNA product, processed pseudogenes also lack the upstream promoters of normal genes; thus, they are considered "dead on arrival", becoming non-functional pseudogenes immediately upon the retrotransposition event.
However, these insertions occasionally contribute exons to existing genes, usually via
alternatively spliced
Alternative splicing, alternative RNA splicing, or differential splicing, is an alternative RNA splicing, splicing process during gene expression that allows a single gene to produce different splice variants. For example, some exons of a gene ma ...
transcripts.
A further characteristic of processed pseudogenes is common truncation of the 5' end relative to the parent sequence, which is a result of the relatively non-processive retrotransposition mechanism that creates processed pseudogenes.
Processed pseudogenes are continually being created in primates. Human populations, for example, have distinct sets of processed pseudogenes across its individuals.
It has been shown that processed pseudogenes accumulate mutations faster than non-processed pseudogenes.
Non-processed (duplicated)
Gene duplication
Gene duplication (or chromosomal duplication or gene amplification) is a major mechanism through which new genetic material is generated during molecular evolution. It can be defined as any duplication of a region of DNA that contains a gene ...
is another common and important process in the evolution of genomes. A copy of a functional gene may arise as a result of a gene duplication event caused by
homologous recombination
Homologous recombination is a type of genetic recombination in which genetic information is exchanged between two similar or identical molecules of double-stranded or single-stranded nucleic acids (usually DNA as in Cell (biology), cellular organi ...
at, for example, repetitive
SINE
In mathematics, sine and cosine are trigonometric functions of an angle. The sine and cosine of an acute angle are defined in the context of a right triangle: for the specified angle, its sine is the ratio of the length of the side opposite th ...
sequences on misaligned chromosomes and subsequently acquire
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
s that cause the copy to lose the original gene's function. Duplicated pseudogenes usually have all the same characteristics as genes, including an intact
exon
An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequence ...
-
intron
An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word ''intron'' is derived from the term ''intragenic region'', i.e., a region inside a gene."The notion of the cistron .e., gen ...
structure and regulatory sequences. The loss of a duplicated gene's functionality usually has little effect on an organism's
fitness, since an intact functional copy still exists. According to some evolutionary models, shared duplicated pseudogenes indicate the evolutionary relatedness of humans and the other primates.
If pseudogenization is due to gene duplication, it usually occurs in the first few million years after the gene duplication, provided the gene has not been subjected to any
selection pressure
Evolutionary pressure, selective pressure or selection pressure is exerted by factors that reduce or increase reproductive success in a portion of a population, driving natural selection. It is a quantitative description of the amount of change oc ...
.
Gene duplication generates functional
redundancy and it is not normally advantageous to carry two identical genes. Mutations that disrupt either the structure or the function of either of the two genes are not deleterious and will not be removed through the selection process. As a result, the gene that has been mutated gradually becomes a pseudogene and will be either unexpressed or functionless. This kind of evolutionary fate is shown by population
genetic modeling
and also by
genome analysis
Personal genomics or consumer genetics is the branch of genomics concerned with the sequencing, analysis and interpretation of the genome of an individual. The genotyping stage employs different techniques, including single-nucleotide polymorphi ...
.
According to evolutionary context, these pseudogenes will either be deleted or become so distinct from the parental genes so that they will no longer be identifiable. Relatively young pseudogenes can be recognized due to their sequence similarity.
Unitary pseudogenes

Various mutations (such as
indel
Indel (insertion-deletion) is a molecular biology term for an insertion or deletion of bases in the genome of an organism. Indels ≥ 50 bases in length are classified as structural variants.
In coding regions of the genome, unless the lengt ...
s and
nonsense mutation
In genetics, a nonsense mutation is a point mutation in a sequence of DNA that results in a ''nonsense codon'', or a premature stop codon in the transcribed mRNA, and leads to a truncated, incomplete, and possibly nonfunctional protein product. No ...
s) can prevent a gene from being normally
transcribed or
translated
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...
, and thus the gene may become less- or non-functional or "deactivated". These are the same mechanisms by which non-processed genes become pseudogenes, but the difference in this case is that the gene was not duplicated before pseudogenization. Normally, such a pseudogene would be unlikely to become fixed in a population, but various population effects, such as
genetic drift
Genetic drift, also known as random genetic drift, allelic drift or the Wright effect, is the change in the Allele frequency, frequency of an existing gene variant (allele) in a population due to random chance.
Genetic drift may cause gene va ...
, a
population bottleneck
A population bottleneck or genetic bottleneck is a sharp reduction in the size of a population due to environmental events such as famines, earthquakes, floods, fires, disease, and droughts; or human activities such as genocide, speciocide, wid ...
, or, in some cases,
natural selection
Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the Heredity, heritable traits characteristic of a population over generation ...
, can lead to fixation. The classic example of a unitary pseudogene is the gene that presumably coded the enzyme
L-gulono-γ-lactone oxidase (GULO) in primates. In all mammals studied besides primates (except guinea pigs), GULO aids in the biosynthesis of
ascorbic acid
Ascorbic acid is an organic compound with formula , originally called hexuronic acid. It is a white solid, but impure samples can appear yellowish. It dissolves freely in water to give mildly acidic solutions. It is a mild reducing agent.
Asco ...
(vitamin C), but it exists as a disabled gene (GULOP) in humans and other primates.
Another more recent example of a disabled gene links the deactivation of the
caspase 12 gene (through a
nonsense mutation
In genetics, a nonsense mutation is a point mutation in a sequence of DNA that results in a ''nonsense codon'', or a premature stop codon in the transcribed mRNA, and leads to a truncated, incomplete, and possibly nonfunctional protein product. No ...
) to positive selection in humans.
Polymorphic pseudogenes
Some pseudogenes are still intact in some individuals but inactivated (mutated) in others. Abascal et al. have called these pseudogenes "polymorphic". They are often
homozygous
Zygosity (the noun, zygote, is from the Greek "yoked," from "yoke") () is the degree to which both copies of a chromosome or gene have the same genetic sequence. In other words, it is the degree of similarity of the alleles in an organism.
Mos ...
for loss-of-function (LoF) variants, that is, in many people both copies are inactive. Polymorphic pseudogenes often represent non-essential (or dispensable) genes, as opposed to essential genes, and their frequent mutations are actually a criterion to establish them as non-essential. Lopes-Marques et al. define polymorphic pseudogenes as genes that carry a LoF allele with a frequency higher than 1% (in global or certain sub-populations) and without overt pathogenic consequences when homozygous.
Examples of pseudogene function
While the vast majority of pseudogenes have lost their function, some cases have emerged in which a pseudogene either re-gained its original or a similar function or evolved a new function. In the
human genome
The human genome is a complete set of nucleic acid sequences for humans, encoded as the DNA within each of the 23 distinct chromosomes in the cell nucleus. A small DNA molecule is found within individual Mitochondrial DNA, mitochondria. These ar ...
, a number of examples have been identified that were originally classified as pseudogenes but later discovered to have a functional, although not necessarily protein-coding, role.
Examples include the following:
Protein-coding: ""
The rapid proliferation of
DNA sequencing
DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, thymine, cytosine, and guanine. The ...
technologies has led to the identification of many apparent pseudogenes using
gene prediction
In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functio ...
techniques. Pseudogenes are often identified by the appearance of a premature
stop codon
In molecular biology, a stop codon (or termination codon) is a codon (nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in messenger RNA correspond to the additio ...
in a predicted mRNA sequence, which would, in theory, prevent synthesis (
translation
Translation is the communication of the semantics, meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The English la ...
) of the normal
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
product of the original gene. There have been some reports of
translational readthrough of such premature stop codons in mammals. As alluded to in the figure above, a small amount of the protein product of such readthrough may still be recognizable and function at some level. If so, the pseudogene can be subject to
natural selection
Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the Heredity, heritable traits characteristic of a population over generation ...
. That appears to have happened during the evolution of ''
Drosophila
''Drosophila'' (), from Ancient Greek δρόσος (''drósos''), meaning "dew", and φίλος (''phílos''), meaning "loving", is a genus of fly, belonging to the family Drosophilidae, whose members are often called "small fruit flies" or p ...
''
species
A species () is often defined as the largest group of organisms in which any two individuals of the appropriate sexes or mating types can produce fertile offspring, typically by sexual reproduction. It is the basic unit of Taxonomy (biology), ...
.
In 2016 it was reported that four predicted pseudogenes in multiple ''Drosophila'' species actually encode proteins with biologically important functions,
"suggesting that such 'pseudo-pseudogenes' could represent a widespread phenomenon". For example, the functional protein (a glutamate
olfactory receptor
Olfactory receptors (ORs), also known as odorant receptors, are chemoreceptors expressed in the cell membranes of olfactory receptor neurons and are responsible for the detection of odorants (for example, compounds that have an odor) which give ...
) from gene Ir75a is found only in
neurons
A neuron (American English), neurone (British English), or nerve cell, is an membrane potential#Cell excitability, excitable cell (biology), cell that fires electric signals called action potentials across a neural network (biology), neural net ...
. This finding of tissue-specific biologically-functional genes that could have been classified as pseudogenes by ''
in silico
In biology and other experimental sciences, an ''in silico'' experiment is one performed on a computer or via computer simulation software. The phrase is pseudo-Latin for 'in silicon' (correct ), referring to silicon in computer chips. It was c ...
'' analysis complicates the analysis of sequence data.
Another ''Drosophilia'' pseudo-pseudogene is ''jingwei'', which encodes a functional
alcohol dehydrogenase
Alcohol dehydrogenases (ADH) () are a group of dehydrogenase enzymes that occur in many organisms and facilitate the interconversion between alcohols and aldehydes or ketones with the reduction of nicotinamide adenine dinucleotide (NAD+) to N ...
enzyme ''in vivo''.
As of 2012, it appeared that there are approximately 12,000–14,000 pseudogenes in the human genome.
A 2016
proteogenomics
Proteogenomics is a field of biological research that utilizes a combination of proteomics, genomics, and transcriptomics to aid in the discovery and identification of Peptide, peptides. Proteogenomics is used to identify new peptides by comparing ...
analysis using
mass spectrometry
Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is used ...
of peptides identified at least 19,262 human proteins produced from 16,271 genes or clusters of genes, with 8 new protein-coding genes identified that were previously considered pseudogenes.
An earlier analysis found that human
PGAM4 (phosphoglycerate mutase), previously thought to be a pseudogene, is not only functional, but also causes infertility if mutated.
A number of pseudo-pseudogenes were also found in prokaryotes, where some stop codon substitutions in essential genes appear to be retained, even positively selected for.
Non-protein-coding
siRNAs. Some endogenous
siRNA
Small interfering RNA (siRNA), sometimes known as short interfering RNA or silencing RNA, is a class of double-stranded non-coding RNA molecules, typically 20–24 base pairs in length, similar to microRNA (miRNA), and operating within the RN ...
s appear to be derived from pseudogenes, and thus some pseudogenes play a role in regulating protein-coding transcripts, as reviewed. One of the many examples is psiPPM1K. Processing of RNAs transcribed from psiPPM1K yield siRNAs that can act to suppress the most common type of liver cancer,
hepatocellular carcinoma
Hepatocellular carcinoma (HCC) is the most common type of primary liver cancer in adults and is currently the most common cause of death in people with cirrhosis. HCC is the third leading cause of cancer-related deaths worldwide.
HCC most common ...
. This and much other research has led to considerable excitement about the possibility of targeting pseudogenes with/as therapeutic agents
piRNAs. Some
piRNAs are derived from pseudogenes located in piRNA clusters. Those piRNAs regulate genes via the piRNA pathway in mammalian testes and are crucial for limiting
transposable element
A transposable element (TE), also transposon, or jumping gene, is a type of mobile genetic element, a nucleic acid sequence in DNA that can change its position within a genome.
The discovery of mobile genetic elements earned Barbara McClinto ...
damage to the genome.

microRNAs. There are many reports of pseudogene transcripts acting as
microRNA
Micro ribonucleic acid (microRNA, miRNA, μRNA) are small, single-stranded, non-coding RNA molecules containing 21–23 nucleotides. Found in plants, animals, and even some viruses, miRNAs are involved in RNA silencing and post-transcr ...
decoys. Perhaps the earliest definitive example of such a pseudogene involved in cancer is the pseudogene of
BRAF. The BRAF gene is a
proto-oncogene
An oncogene is a gene that has the potential to cause cancer. In tumor cells, these genes are often mutated, or expressed at high levels. that, when mutated, is associated with many cancers. Normally, the amount of BRAF protein is kept under control in cells through the action of miRNA. In normal situations, the amount of RNA from BRAF and the pseudogene BRAFP1 compete for miRNA, but the balance of the 2 RNAs is such that cells grow normally. However, when BRAFP1 RNA expression is increased (either experimentally or by natural mutations), less miRNA is available to control the expression of BRAF, and the increased amount of BRAF protein causes cancer. This sort of competition for regulatory elements by RNAs that are endogenous to the genome has given rise to the term
ceRNA.
PTEN. The
PTEN gene is a known
tumor suppressor gene
A tumor suppressor gene (TSG), or anti-oncogene, is a gene that regulates a cell (biology), cell during cell division and replication. If the cell grows uncontrollably, it will result in cancer. When a tumor suppressor gene is mutated, it results ...
. The PTEN pseudogene, PTENP1 is a processed pseudogene that is very similar in its genetic sequence to the wild-type gene. However, PTENP1 has a missense mutation which eliminates the
codon
Genetic code is a set of rules used by living cells to translate information encoded within genetic material (DNA or RNA sequences of nucleotide triplets or codons) into proteins. Translation is accomplished by the ribosome, which links prote ...
for the
initiating methionine and thus prevents translation of the normal PTEN protein. In spite of that, PTENP1 appears to play a role in
oncogenesis
Carcinogenesis, also called oncogenesis or tumorigenesis, is the formation of a cancer, whereby normal cells are transformed into cancer cells. The process is characterized by changes at the cellular, genetic, and epigenetic levels and abno ...
. The 3'
UTR of PTENP1 mRNA functions as a decoy of PTEN mRNA by targeting
micro RNA
Micro ribonucleic acid (microRNA, miRNA, μRNA) are small, single-stranded, non-coding RNA molecules containing 21–23 nucleotides. Found in plants, animals, and even some viruses, miRNAs are involved in RNA silencing and post-transcri ...
s due to its similarity to the PTEN gene, and overexpression of the 3' UTR resulted in an increase of PTEN protein level. That is, overexpression of the PTENP1 3' UTR leads to increased regulation and suppression of cancerous tumors. The biology of this system is basically the inverse of the BRAF system described above.
Potogenes. Pseudogenes can, over evolutionary time scales, participate in
gene conversion
Gene conversion is the process by which one DNA sequence replaces a homologous sequence such that the sequences become identical after the conversion. Gene conversion can be either allelic, meaning that one allele of the same gene replaces another ...
and other mutational events that may give rise to new or newly functional genes. This has led to the concept that ''pseudo''genes could be viewed as ''pot''ogenes: ''pot''ential genes for evolutionary diversification.
Bacterial pseudogenes
Pseudogenes are found in
bacteria
Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
. Most are found in bacteria that are not free-living; that is, they are either
symbiont
Symbiosis (Ancient Greek : living with, companionship < : together; and ''bíōsis'': living) is any type of a close and long-term biological interaction, between two organisms of different species. The two organisms, termed symbionts, can fo ...
s or
obligate intracellular parasite
Intracellular parasites are microparasites that are capable of growing and reproducing inside the cells of a host. They are also called intracellular pathogens.
Types
There are two main types of intracellular parasites: Facultative and Obligate ...
s. Thus, they do not require many genes that are needed by free-living bacteria, such as gene associated with metabolism and DNA repair. However, there is not an order to which functional
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s are lost first. For example, the oldest pseudogenes in ''
Mycobacterium leprae
''Mycobacterium leprae'' (also known as the leprosy bacillus or Hansen's bacillus) is one
of the two species of bacteria that cause Hansen's disease (leprosy), a chronic but curable infectious disease that damages the peripheral nerves and ta ...
'' are in
RNA polymerase
In molecular biology, RNA polymerase (abbreviated RNAP or RNApol), or more specifically DNA-directed/dependent RNA polymerase (DdRP), is an enzyme that catalyzes the chemical reactions that synthesize RNA from a DNA template.
Using the e ...
s and the
biosynthesis
Biosynthesis, i.e., chemical synthesis occurring in biological contexts, is a term most often referring to multi-step, enzyme-Catalysis, catalyzed processes where chemical substances absorbed as nutrients (or previously converted through biosynthe ...
of
secondary metabolite
Secondary metabolites, also called ''specialised metabolites'', ''secondary products'', or ''natural products'', are organic compounds produced by any lifeform, e.g. bacteria, archaea, fungi, animals, or plants, which are not directly involved ...
s while the oldest ones in ''
Shigella flexneri
''Shigella flexneri'' is a species of Gram-negative bacteria in the genus ''Shigella'' that can cause diarrhea in humans. Several different serogroups of ''Shigella'' are described; ''S. flexneri'' belongs to group ''B''. ''S. flexneri'' infect ...
'' and ''
Shigella typhi'' are in
DNA replication
In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all life, living organisms, acting as the most essential part of heredity, biolog ...
, recombination, and
repair
The technical meaning of maintenance involves functional checks, servicing, repairing or replacing of necessary devices, equipment, machinery, building infrastructure and supporting utilities in industrial, business, and residential installat ...
.
Since most bacteria that carry pseudogenes are either symbionts or obligate intracellular parasites, genome size eventually reduces. An extreme example is the genome of ''
Mycobacterium leprae
''Mycobacterium leprae'' (also known as the leprosy bacillus or Hansen's bacillus) is one
of the two species of bacteria that cause Hansen's disease (leprosy), a chronic but curable infectious disease that damages the peripheral nerves and ta ...
'', an obligate parasite and the causative agent of
leprosy
Leprosy, also known as Hansen's disease (HD), is a Chronic condition, long-term infection by the bacteria ''Mycobacterium leprae'' or ''Mycobacterium lepromatosis''. Infection can lead to damage of the Peripheral nervous system, nerves, respir ...
. It has been reported to have 1,133 pseudogenes which give rise to approximately 50% of its
transcriptome
The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The ...
.
The effect of pseudogenes and genome reduction can be further seen when compared to ''
Mycobacterium marinum
''Mycobacterium marinum'' is a slow growing fresh and saltwater mycobacterium (SGM) belonging to the genus ''Mycobacterium'' and the phylum Actinobacteria. It was formerly known as ''Mycobacterium balnei''. The strain marinum was first identified ...
'', a
pathogen
In biology, a pathogen (, "suffering", "passion" and , "producer of"), in the oldest and broadest sense, is any organism or agent that can produce disease. A pathogen may also be referred to as an infectious agent, or simply a Germ theory of d ...
from the same family. ''Mycobacteirum marinum'' has a larger genome compared to ''Mycobacterium leprae'' because it can survive outside the host; therefore, the genome must contain the genes needed to do so.
Although genome reduction focuses on what genes are not needed by getting rid of pseudogenes, selective pressures from the host can sway what is kept. In the case of a symbiont from the ''
Verrucomicrobiota
Verrucomicrobiota is a phylum of Gram-negative bacteria that contains only a few described species. The species identified have been isolated from fresh water, marine and soil environments and human faeces. A number of as-yet uncultivated species ...
'' phylum, there are seven additional copies of the gene coding the mandelalide pathway.
The host, species from ''Lissoclinum'', use mandelalides as part of its defense mechanism.
The relationship between
epistasis
Epistasis is a phenomenon in genetics in which the effect of a gene mutation is dependent on the presence or absence of mutations in one or more other genes, respectively termed modifier genes. In other words, the effect of the mutation is depe ...
and the domino theory of gene loss was observed in ''Buchnera aphidicola''. The domino theory suggests that if one gene of a cellular process becomes inactivated, then selection in other genes involved relaxes, leading to gene loss.
When comparing ''
Buchnera aphidicola
''Buchnera aphidicola'', a member of the Pseudomonadota and the only species in the genus ''Buchnera'', is the primary endosymbiont of aphids, and has been studied in the pea aphid, '' Acyrthosiphon pisum''. ''Buchnera'' is believed to have had ...
'' and ''
Escherichia coli
''Escherichia coli'' ( )Wells, J. C. (2000) Longman Pronunciation Dictionary. Harlow ngland Pearson Education Ltd. is a gram-negative, facultative anaerobic, rod-shaped, coliform bacterium of the genus '' Escherichia'' that is commonly fo ...
,'' it was found that positive epistasis furthers gene loss while negative epistasis hinders it.
See also
*
List of disabled human pseudogenes
This is a list of human pseudogenes that are known to be disabled genes.
* NCF1C pseudogene, associated with a type of white blood cell. It is related to NCF1. It may disable NCF1 by recombination, leading to chronic granulomatous disease.
* GU ...
*
Molecular evolution
Molecular evolution describes how Heredity, inherited DNA and/or RNA change over evolutionary time, and the consequences of this for proteins and other components of Cell (biology), cells and organisms. Molecular evolution is the basis of phylogen ...
*
Molecular paleontology
*
Pseudogene (database)
Pseudogene is a database of pseudogene
Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Pseudogenes can be formed from both protein-coding genes and non-coding genes. In the case of protein-coding genes, most pseudo ...
*
Retroposon
Retroposons are repetitive DNA fragments which are inserted into chromosomes after they had been reverse transcribed from any RNA molecule.
Difference between retroposons and retrotransposons
In contrast to retrotransposons, retroposons never e ...
*
Retrotransposon
Retrotransposons (also called Class I transposable elements) are mobile elements which move in the host genome by converting their transcribed RNA into DNA through reverse transcription. Thus, they differ from Class II transposable elements, or ...
References
Further reading
*
*
*
*
External links
Pseudogene interaction database, miRNA-pseudogene and protein-pseudogene interaction maps databaseYale University pseudogene database(homologous processed pseudogenes)
RCPedia - Processed Pseudogene database
{{Repeated sequence
Non-coding DNA