Coding region
   HOME

TheInfoList



OR:

The coding region of a
gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...
, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...
. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of
prokaryote A prokaryote () is a single-celled organism that lacks a nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Greek πρό (, 'before') and κάρυον (, 'nut' or 'kernel').Campbell, N. "Biology:Concepts & Con ...
s and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.


Definition

Although this term is also sometimes used interchangeably with exon, it is not the exact same thing: the exon is composed of the coding region as well as the 3' and 5' untranslated regions of the RNA, and so therefore, an exon would be partially made up of coding regions. The 3' and 5' untranslated regions of the RNA, which do not code for protein, are termed
non-coding Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and regul ...
regions and are not discussed on this page. There is often confusion between coding regions and exomes and there is a clear distinction between these terms. While the exome refers to all exons within a genome, the coding region refers to a singular section of the DNA or RNA which specifically codes for a certain kind of protein.  


History

In 1978, Walter Gilbert published "Why Genes in Pieces" which first began to explore the idea that the gene is a mosaic—that each full nucleic acid strand is not coded continuously but is interrupted by "silent" non-coding regions. This was the first indication that there needed to be a distinction between the parts of the genome that code for protein, now called coding regions, and those that do not.


Composition

The evidence suggests that there is a general interdependence between base composition patterns and coding region availability. The coding region is thought to contain a higher GC-content than non-coding regions. There is further research that discovered that the longer the coding strand, the higher the GC-content. Short coding strands are comparatively still GC-poor, similar to the low GC-content of the base composition translational stop codons like TAG, TAA, and TGA. GC-rich areas are also where the ratio point mutation type is altered slightly: there are more transitions, which are changes from purine to purine or pyrimidine to pyrimidine, compared to transversions, which are changes from purine to pyrimidine or pyrimidine to purine. The transitions are less likely to change the encoded amino acid and remain a silent mutation (especially if they occur in the third
nucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecu ...
of a codon) which is usually beneficial to the organism during translation and protein formation. This indicates that essential coding regions (gene-rich) are higher in GC-content and more stable and resistant to
mutation In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, m ...
compared to accessory and non-essential regions (gene-poor). However, it is still unclear whether this came about through neutral and random mutation or through a pattern of selection. There is also debate on whether the methods used, such as gene windows, to ascertain the relationship between GC-content and coding region are accurate and unbiased.


Structure and function

In DNA, the coding region is flanked by the promoter sequence on the 5' end of the template strand and the termination sequence on the 3' end. During transcription, the RNA Polymerase (RNAP) binds to the promoter sequence and moves along the template strand to the coding region. RNAP then adds RNA
nucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecu ...
s complementary to the coding region in order to form the
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the ...
, substituting uracil in place of
thymine Thymine () ( symbol T or Thy) is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidin ...
.Overview of transcription. (n.d.). Retrieved from https://www.khanacademy.org/science/biology/gene-expression-central-dogma/transcription-of-dna-into-rna/a/overview-of-transcription. This continues until the RNAP reaches the termination sequence. After transcription and maturation, the mature mRNA formed encompasses multiple parts important for its eventual translation into
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...
. The coding region in an mRNA is flanked by the
5' untranslated region The 5′ untranslated region (also known as 5′ UTR, leader sequence, transcript leader, or leader RNA) is the region of a messenger RNA (mRNA) that is directly upstream from the initiation codon. This region is important for the regulation of ...
(5'-UTR) and 3' untranslated region (3'-UTR), the
5' cap In molecular biology, the five-prime cap (5′ cap) is a specially altered nucleotide on the 5′ end of some primary transcripts such as precursor messenger RNA. This process, known as mRNA capping, is highly regulated and vital in the creation ...
, and Poly-A tail. During
translation Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...
, the ribosome facilitates the attachment of the
tRNAs Transfer RNA (abbreviated tRNA and formerly referred to as sRNA, for soluble RNA) is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes), that serves as the physical link between the mRNA and the amino ...
to the coding region, 3 nucleotides at a time ( codons). The tRNAs transfer their associated
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha ...
s to the growing polypeptide chain, eventually forming the protein defined in the initial DNA coding region.


Regulation

The coding region can be modified in order to regulate gene expression.
Alkylation Alkylation is the transfer of an alkyl group from one molecule to another. The alkyl group may be transferred as an alkyl carbocation, a free radical, a carbanion, or a carbene (or their equivalents). Alkylating agents are reagents for effectin ...
is one form of regulation of the coding region. The gene that would have been transcribed can be silenced by targeting a specific sequence. The bases in this sequence would be blocked using
alkyl groups In organic chemistry, an alkyl group is an alkane missing one hydrogen. The term ''alkyl'' is intentionally unspecific to include many possible substitutions. An acyclic alkyl has the general formula of . A cycloalkyl is derived from a cycloalk ...
, which create the
silencing Silencing is a visual illusion in which a set of objects that change iluminancehueregulation of gene expression manages the abundance of RNA or protein made in a cell, the regulation of these mechanisms can be controlled by a
regulatory sequence A regulatory sequence is a segment of a nucleic acid molecule which is capable of increasing or decreasing the expression of specific genes within an organism. Regulation of gene expression is an essential feature of all living organisms and ...
found before the
open reading frame In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible readi ...
begins in a strand of DNA. The
regulatory sequence A regulatory sequence is a segment of a nucleic acid molecule which is capable of increasing or decreasing the expression of specific genes within an organism. Regulation of gene expression is an essential feature of all living organisms and ...
will then determine the location and time that expression will occur for a protein coding region. RNA splicing ultimately determines what part of the sequence becomes translated and expressed, and this process involves cutting out introns and putting together exons. Where the RNA spliceosome cuts, however, is guided by the recognition of splice sites, in particular the 5' splicing site, which is one of the substrates for the first step in splicing. The coding regions are within the exons, which become covalently joined together to form the mature messenger RNA.


Mutations

Mutation In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, m ...
s in the coding region can have very diverse effects on the phenotype of the organism. While some mutations in this region of DNA/RNA can result in advantageous changes, others can be harmful and sometimes even lethal to an organism's survival. In contrast, changes in the coding region may not always result in detectable changes in phenotype.


Mutation types

There are various forms of mutations that can occur in coding regions. One form is silent mutations, in which a change in nucleotides does not result in any change in amino acid after transcription and translation.Yang, J. (2016, March 23). What are Genetic Mutation? Retrieved from https://www.singerinstruments.com/resource/what-are-genetic-mutation/. There also exist nonsense mutations, where base alterations in the coding region code for a premature stop codon, producing a shorter final protein. Point mutations, or single base pair changes in the coding region, that code for different amino acids during translation, are called missense mutations. Other types of mutations include
frameshift mutation A frameshift mutation (also called a framing error or a reading frame shift) is a genetic mutation caused by indels ( insertions or deletions) of a number of nucleotides in a DNA sequence that is not divisible by three. Due to the triplet nature ...
s such as insertions or deletions.


Formation

Some forms of mutations are hereditary ( germline mutations), or passed on from a parent to its offspring.What is a gene mutation and how do mutations occur? - Genetics Home Reference - NIH. (n.d.). Retrieved from https://ghr.nlm.nih.gov/primer/mutationsanddisorders/genemutation. Such mutated coding regions are present in all cells within the organism. Other forms of mutations are acquired ( somatic mutations) during an organisms lifetime, and may not be constant cell-to-cell. These changes can be caused by mutagens,
carcinogen A carcinogen is any substance, radionuclide, or radiation that promotes carcinogenesis (the formation of cancer). This may be due to the ability to damage the genome or to the disruption of cellular metabolic processes. Several radioactive sub ...
s, or other environmental agents (ex. UV). Acquired mutations can also be a result of copy-errors during
DNA replication In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all living organisms acting as the most essential part for biological inheritan ...
and are not passed down to offspring. Changes in the coding region can also be de novo (new); such changes are thought to occur shortly after
fertilization Fertilisation or fertilization (see spelling differences), also known as generative fertilisation, syngamy and impregnation, is the fusion of gametes to give rise to a new individual organism or offspring and initiate its development. Pro ...
, resulting in a mutation present in the offspring's DNA while being absent in both the sperm and egg cells.


Prevention

There exist multiple transcription and translation mechanisms to prevent lethality due to deleterious mutations in the coding region. Such measures include proofreading by some DNA Polymerases during replication, mismatch repair following replication, and the ' Wobble Hypothesis' which describes the degeneracy of the third base within an mRNA codon.


Constrained coding regions (CCRs)

While it is well known that the genome of one individual can have extensive differences when compared to the genome of another, recent research has found that some coding regions are highly constrained, or resistant to mutation, between individuals of the same species. This is similar to the concept of interspecies constraint in conserved sequences. Researchers termed these highly constrained sequences constrained coding regions (CCRs), and have also discovered that such regions may be involved in high purifying selection. On average, there is approximately 1 protein-altering mutation every 7 coding bases, but some CCRs can have over 100 bases in sequence with no observed protein-altering mutations, some without even synonymous mutations.Havrilla, J. M., Pedersen, B. S., Layer, R. M., & Quinlan, A. R. (2018). A map of constrained coding regions in the human genome. ''Nature Genetics'', 88–95. doi: 10.1101/220814 These patterns of constraint between genomes may provide clues to the sources of rare developmental diseases or potentially even embryonic lethality. Clinically validated variants and
de novo mutation A de novo mutation is any mutation/alteration in the genome of any organism (humans, animals, plant, microbes, etc.) that wasn't present or transmitted by their parents. This type of mutation (like any other) occurs spontaneously during the process ...
s in CCRs have been previously linked to disorders such as infantile epileptic encephalopathy, developmental delay and severe heart disease.


Coding sequence detection

While identification of open reading frames within a DNA sequence is straightforward, identifying coding sequences is not, because the cell translates only a subset of all open reading frames to proteins. Currently CDS prediction uses sampling and sequencing of mRNA from cells, although there is still the problem of determining which parts of a given mRNA are actually translated to protein. CDS prediction is a subset of
gene prediction In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functio ...
, the latter also including prediction of DNA sequences that code not only for protein but also for other functional elements such as RNA genes and regulatory sequences. In both
prokaryote A prokaryote () is a single-celled organism that lacks a nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Greek πρό (, 'before') and κάρυον (, 'nut' or 'kernel').Campbell, N. "Biology:Concepts & Con ...
s and eukaryotes, gene overlapping occurs relatively often in both DNA and RNA viruses as an evolutionary advantage to reduce genome size while retaining the ability to produce various proteins from the available coding regions. For both DNA and RNA, pairwise alignments can detect overlapping coding regions, including short
open reading frame In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible readi ...
s in viruses, but would require a known coding strand to compare the potential overlapping coding strand with. An alternative method using single genome sequences would not require multiple genome sequences to execute comparisons but would require at least 50 nucleotides overlapping in order to be sensitive.


See also

*
Coding strand When referring to DNA transcription, the coding strand (or informational strand) is the DNA strand whose base sequence is identical to the base sequence of the RNA transcript produced (although with thymine replaced by uracil). It is this strand ...
The DNA strand that codes for a protein *
Exon An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequen ...
The entire portion of the strand that is transcribed * Mature mRNA The portion of the mRNA transcription product that is translated * Gene structure The other elements that make up a gene * Nested gene Entire coding sequence lies within the bounds of a larger external gene * Non-coding DNA Parts of genomes that do not encode protein-coding genes *
Non-coding RNA A non-coding RNA (ncRNA) is a functional RNA molecule that is not Translation (genetics), translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally im ...
Molecules that do not encode proteins, so have no CDS


References

{{Reflist DNA Biochemistry