
An exon is any part of a
gene
In biology, the word gene (from , ; "... Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...
that will form a part of the final mature
RNA
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
produced by that gene after
intron
An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word ''intron'' is derived from the term ''intragenic region'', i.e. a region inside a gene."The notion of the cistron .e., gene ...
s have been removed by
RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequence in RNA transcripts. In RNA splicing, introns are removed and exons are covalently joined to one another as part of generating the mature
RNA
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
. Just as the entire set of genes for a
species
In biology, a species is the basic unit of Taxonomy (biology), classification and a taxonomic rank of an organism, as well as a unit of biodiversity. A species is often defined as the largest group of organisms in which any two individuals of ...
constitutes the
genome
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ...
, the entire set of exons constitutes the
exome.
History
The term ''exon'' derives from the expressed region and was coined by American
biochemist
Biochemists are scientists who are trained in biochemistry. They study chemical processes and chemical transformations in living organisms. Biochemists study DNA, proteins and cell parts. The word "biochemist" is a portmanteau of "biological che ...
Walter Gilbert in 1978: "The notion of the
cistron… must be replaced by that of a transcription unit containing regions which will be lost from the mature messengerwhich I suggest we call introns (for intragenic regions)alternating with regions which will be expressedexons."
This definition was originally made for protein-coding transcripts that are spliced before being translated. The term later came to include sequences removed from
rRNA
Ribosomal ribonucleic acid (rRNA) is a type of non-coding RNA which is the primary component of ribosomes, essential to all cells. rRNA is a ribozyme which carries out protein synthesis in ribosomes. Ribosomal RNA is transcribed from riboso ...
and
tRNA
Transfer RNA (abbreviated tRNA and formerly referred to as sRNA, for soluble RNA) is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes), that serves as the physical link between the mRNA and the amino a ...
, and other
ncRNA and it also was used later for RNA molecules originating from different parts of the genome that are then
ligated by trans-splicing.
Contribution to genomes and size distribution
Although unicellular
eukaryote
Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bact ...
s such as yeast have either no introns or very few,
metazoans and especially
vertebrate
Vertebrates () comprise all animal taxon, taxa within the subphylum Vertebrata () (chordates with vertebral column, backbones), including all mammals, birds, reptiles, amphibians, and fish. Vertebrates represent the overwhelming majority of the ...
genomes have a large fraction of
non-coding DNA. For instance, in the
human genome
The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the ...
only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being
intergenic DNA. This can provide a practical advantage in
omics-aided
health care
Health care or healthcare is the improvement of health via the prevention, diagnosis, treatment, amelioration or cure of disease, illness, injury, and other physical and mental impairments in people. Health care is delivered by health ...
(such as
precision medicine) because it makes commercialized
whole exome sequencing a smaller and less expensive challenge than commercialized
whole genome sequencing
Whole genome sequencing (WGS), also known as full genome sequencing, complete genome sequencing, or entire genome sequencing, is the process of determining the entirety, or nearly the entirety, of the DNA sequence of an organism's genome at a ...
. The large variation in
genome size and
C-value across
life forms has posed an interesting challenge called the
C-value enigma.
Across all eukaryotic genes in GenBank, there were (in 2002), on average, 5.48 exons per protein coding gene. The average exon encoded 30-36
amino acid
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha ...
s.
While the longest exon in the human genome is 11555
bp long, several exons have been found to be only 2 bp long. A single-nucleotide exon has been reported from the ''
Arabidopsis
''Arabidopsis'' (rockcress) is a genus in the family Brassicaceae. They are small flowering plants related to cabbage and mustard. This genus is of great interest since it contains thale cress (''Arabidopsis thaliana''), one of the model org ...
'' genome. In humans, like protein coding
mRNA
In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein.
mRNA is created during the ...
, most
non-coding RNA
A non-coding RNA (ncRNA) is a functional RNA molecule that is not translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally important types of non- ...
also contain multiple exons
Structure and function

In protein-coding genes, the exons include both the protein-coding sequence and the 5′- and 3′-
untranslated region
In molecular genetics, an untranslated region (or UTR) refers to either of two sections, one on each side of a coding sequence on a strand of mRNA. If it is found on the 5' side, it is called the 5' UTR (or leader sequence), or if it is f ...
s (UTR). Often the first exon includes both the 5′-UTR and the first part of the coding sequence, but exons containing only regions of 5′-UTR or (more rarely) 3′-UTR occur in some genes, i.e. the UTRs may contain introns. Some
non-coding RNA
A non-coding RNA (ncRNA) is a functional RNA molecule that is not translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally important types of non- ...
transcripts also have exons and introns.
Mature mRNAs originating from the same gene need not include the same exons, since different introns in the pre-mRNA can be removed by the process of
alternative splicing.
Exonization is the creation of a new exon, as a result of mutations in
introns.
Experimental approaches using exons
Exon trapping or '
gene trapping' is a
molecular biology
Molecular biology is the branch of biology that seeks to understand the molecular basis of biological activity in and between cells, including biomolecular synthesis, modification, mechanisms, and interactions. The study of chemical and phys ...
technique that exploits the existence of the intron-exon
splicing to find new genes. The first exon of a 'trapped' gene splices into the exon that is contained in the
insertional DNA. This new exon contains the ORF for a
reporter gene
In molecular biology, a reporter gene (often simply reporter) is a gene that researchers attach to a regulatory sequence of another gene of interest in bacteria, cell culture, animals or plants. Such genes are called reporters because the char ...
that can now be expressed using the
enhancers that control the target gene. A scientist knows that a new gene has been trapped when the reporter gene is expressed.
Splicing can be experimentally modified so that targeted exons are excluded from mature mRNA transcripts by blocking the access of splice-directing small nuclear ribonucleoprotein particles (snRNPs) to pre-mRNA using
Morpholino antisense oligos. This has become a standard technique in
developmental biology
Developmental biology is the study of the process by which animals and plants grow and develop. Developmental biology also encompasses the biology of regeneration, asexual reproduction, metamorphosis, and the growth and differentiation of ste ...
. Morpholino oligos can also be targeted to prevent molecules that regulate splicing (e.g. splice enhancers, splice suppressors) from binding to pre-mRNA, altering patterns of splicing.
Common misuse of the term
Common incorrect uses of the term ''exon'' are that 'exons code for protein', or 'exons code for amino-acids' or 'exons are translated'. As indicated in this article exons may become part of a
non-coding RNA
A non-coding RNA (ncRNA) is a functional RNA molecule that is not translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally important types of non- ...
or the
untranslated region
In molecular genetics, an untranslated region (or UTR) refers to either of two sections, one on each side of a coding sequence on a strand of mRNA. If it is found on the 5' side, it is called the 5' UTR (or leader sequence), or if it is f ...
of
mRNA
In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein.
mRNA is created during the ...
s.
These incorrect definitions (Feb 2022) are found on overall reputable secondary source
NHGRI Nature
See also
*
DBASS3/5
*
Exitron
Exitrons (exonic introns) are produced through alternative splicing and have characteristics of both introns and exons, but are described as retained introns. Even though they are considered introns, which are typically cut out of pre mRNA sequence ...
*
Exon-intron database
*
Exon shuffling
*
Interrupted gene
*
Outron
*
Twintron
*
Untranslated region
In molecular genetics, an untranslated region (or UTR) refers to either of two sections, one on each side of a coding sequence on a strand of mRNA. If it is found on the 5' side, it is called the 5' UTR (or leader sequence), or if it is f ...
(UTR)
References
Bibliography
*
*
External links
Exon-intron graphic maker
{{Authority control
DNA
Spliceosome
RNA splicing