Nucleic acid secondary structure
   HOME

TheInfoList



OR:

Nucleic acid secondary structure is the basepairing interactions within a single nucleic acid polymer or between two polymers. It can be represented as a list of bases which are paired in a nucleic acid molecule. The secondary structures of biological DNAs and RNAs tend to be different: biological DNA mostly exists as fully base paired double helices, while biological RNA is single stranded and often forms complex and intricate base-pairing interactions due to its increased ability to form hydrogen bonds stemming from the extra
hydroxyl In chemistry, a hydroxy or hydroxyl group is a functional group with the chemical formula and composed of one oxygen atom covalently bonded to one hydrogen atom. In organic chemistry, alcohols and carboxylic acids contain one or more hydro ...
group in the
ribose Ribose is a simple sugar and carbohydrate with molecular formula C5H10O5 and the linear-form composition H−(C=O)−(CHOH)4−H. The naturally-occurring form, , is a component of the ribonucleotides from which RNA is built, and so this compo ...
sugar. In a non-biological context, secondary structure is a vital consideration in the nucleic acid design of nucleic acid structures for
DNA nanotechnology DNA nanotechnology is the design and manufacture of artificial nucleic acid structures for technological uses. In this field, nucleic acids are used as non-biological engineering materials for nanotechnology rather than as the carriers of geneti ...
and
DNA computing DNA computing is an emerging branch of unconventional computing which uses DNA, biochemistry, and molecular biology hardware, instead of the traditional electronic computing. Research and development in this area concerns theory, experiments, a ...
, since the pattern of basepairing ultimately determines the overall structure of the molecules.


Fundamental concepts


Base pairing

In
molecular biology Molecular biology is the branch of biology that seeks to understand the molecular basis of biological activity in and between cells, including biomolecular synthesis, modification, mechanisms, and interactions. The study of chemical and physi ...
, two
nucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecule ...
s on opposite complementary DNA or RNA strands that are connected via hydrogen bonds are called a base pair (often abbreviated bp). In the canonical Watson-Crick base pairing,
adenine Adenine () ( symbol A or Ade) is a nucleobase (a purine derivative). It is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The three others are guanine, cytosine and thymine. Its deri ...
(A) forms a base pair with
thymine Thymine () ( symbol T or Thy) is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidi ...
(T) and
guanine Guanine () ( symbol G or Gua) is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine (uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside is c ...
(G) forms one with
cytosine Cytosine () ( symbol C or Cyt) is one of the four nucleobases found in DNA and RNA, along with adenine, guanine, and thymine (uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached (an ...
(C) in DNA. In RNA,
thymine Thymine () ( symbol T or Thy) is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidi ...
is replaced by
uracil Uracil () (symbol U or Ura) is one of the four nucleobases in the nucleic acid RNA. The others are adenine (A), cytosine (C), and guanine (G). In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced b ...
(U). Alternate hydrogen bonding patterns, such as the
wobble base pair A wobble base pair is a pairing between two nucleotides in RNA molecules that does not follow Watson-Crick base pair rules. The four main wobble base pairs are guanine-uracil (G-U), hypoxanthine-uracil (I-U), hypoxanthine-adenine (I-A), and ...
and
Hoogsteen base pair A Hoogsteen base pair is a variation of base-pairing in nucleic acids such as the A•T pair. In this manner, two nucleobases, one on each strand, can be held together by hydrogen bonds in the major groove. A Hoogsteen base pair applies the N7 pos ...
, also occur—particularly in RNA—giving rise to complex and functional tertiary structures. Importantly, pairing is the mechanism by which codons on messenger RNA molecules are recognized by
anticodon Transfer RNA (abbreviated tRNA and formerly referred to as sRNA, for soluble RNA) is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes), that serves as the physical link between the mRNA and the amino ...
s on transfer RNA during protein
translation Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...
. Some DNA- or RNA-binding enzymes can recognize specific base pairing patterns that identify particular regulatory regions of genes. Hydrogen bonding is the chemical mechanism that underlies the base-pairing rules described above. Appropriate geometrical correspondence of hydrogen bond donors and acceptors allows only the "right" pairs to form stably. DNA with high GC-content is more stable than DNA with low GC-content, but contrary to popular belief, the hydrogen bonds do not stabilize the DNA significantly and stabilization is mainly due to stacking interactions. The larger
nucleobase Nucleobases, also known as ''nitrogenous bases'' or often simply ''bases'', are nitrogen-containing biological compounds that form nucleosides, which, in turn, are components of nucleotides, with all of these monomers constituting the basic b ...
s, adenine and guanine, are members of a class of doubly ringed chemical structures called
purine Purine is a heterocyclic aromatic organic compound that consists of two rings ( pyrimidine and imidazole) fused together. It is water-soluble. Purine also gives its name to the wider class of molecules, purines, which include substituted purines ...
s; the smaller nucleobases, cytosine and thymine (and uracil), are members of a class of singly ringed chemical structures called pyrimidines. Purines are only complementary with pyrimidines: pyrimidine-pyrimidine pairings are energetically unfavorable because the molecules are too far apart for hydrogen bonding to be established; purine-purine pairings are energetically unfavorable because the molecules are too close, leading to overlap repulsion. The only other possible pairings are GT and AC; these pairings are mismatches because the pattern of hydrogen donors and acceptors do not correspond. The GU
wobble base pair A wobble base pair is a pairing between two nucleotides in RNA molecules that does not follow Watson-Crick base pair rules. The four main wobble base pairs are guanine-uracil (G-U), hypoxanthine-uracil (I-U), hypoxanthine-adenine (I-A), and ...
, with two hydrogen bonds, does occur fairly often in RNA.


Nucleic acid hybridization

Hybridization is the process of complementary base pairs binding to form a
double helix A double is a look-alike or doppelgänger; one person or being that resembles another. Double, The Double or Dubble may also refer to: Film and television * Double (filmmaking), someone who substitutes for the credited actor of a character * ...
. Melting is the process by which the interactions between the strands of the double helix are broken, separating the two nucleic acid strands. These bonds are weak, easily separated by gentle heating,
enzyme Enzymes () are proteins that act as biological catalysts by accelerating chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as products ...
s, or physical force. Melting occurs preferentially at certain points in the nucleic acid. T and A rich sequences are more easily melted than C and G rich regions. Particular base steps are also susceptible to DNA melting, particularly T A and T G base steps. These mechanical features are reflected by the use of sequences such as TATAA at the start of many genes to assist RNA polymerase in melting the DNA for transcription. Strand separation by gentle heating, as used in PCR, is simple providing the molecules have fewer than about 10,000 base pairs (10 kilobase pairs, or 10 kbp). The intertwining of the DNA strands makes long segments difficult to separate. The cell avoids this problem by allowing its DNA-melting enzymes (
helicase Helicases are a class of enzymes thought to be vital to all organisms. Their main function is to unpack an organism's genetic material. Helicases are motor proteins that move directionally along a nucleic acid phosphodiester backbone, separatin ...
s) to work concurrently with topoisomerases, which can chemically cleave the phosphate backbone of one of the strands so that it can swivel around the other.
Helicase Helicases are a class of enzymes thought to be vital to all organisms. Their main function is to unpack an organism's genetic material. Helicases are motor proteins that move directionally along a nucleic acid phosphodiester backbone, separatin ...
s unwind the strands to facilitate the advance of sequence-reading enzymes such as
DNA polymerase A DNA polymerase is a member of a family of enzymes that catalyze the synthesis of DNA molecules from nucleoside triphosphates, the molecular precursors of DNA. These enzymes are essential for DNA replication and usually work in groups to create ...
.


Secondary structure motifs

Nucleic acid secondary structure is generally divided into helices (contiguous base pairs), and various kinds of loops (unpaired nucleotides surrounded by helices). Frequently these elements, or combinations of them, are further classified into additional categories including, for example, tetraloops, pseudoknots, and stem-loops.


Double helix

The double helix is an important
tertiary structure Protein tertiary structure is the three dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may i ...
in nucleic acid molecules which is intimately connected with the molecule's secondary structure. A double helix is formed by regions of many consecutive base pairs. The nucleic acid double helix is a spiral polymer, usually right-handed, containing two
nucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecule ...
strands which base pair together. A single turn of the helix constitutes about ten nucleotides, and contains a major groove and minor groove, the major groove being wider than the minor groove. Given the difference in widths of the major groove and minor groove, many proteins which bind to DNA do so through the wider major groove. Many double-helical forms are possible; for DNA the three biologically relevant forms are
A-DNA A-DNA is one of the possible double helical structures which DNA can adopt. A-DNA is thought to be one of three biologically active double helical structures along with B-DNA and Z-DNA. It is a right-handed double helix fairly similar to the m ...
, B-DNA, and
Z-DNA Z-DNA is one of the many possible double helical structures of DNA. It is a left-handed double helical structure in which the helix winds to the left in a zigzag pattern, instead of to the right, like the more common B-DNA form. Z-DNA is thought ...
, while RNA double helices have structures similar to the A form of DNA.


Stem-loop structures

The secondary structure of nucleic acid molecules can often be uniquely decomposed into stems and loops. The stem-loop structure (also often referred to as an "hairpin"), in which a base-paired helix ends in a short unpaired loop, is extremely common and is a building block for larger structural motifs such as cloverleaf structures, which are four-helix junctions such as those found in transfer RNA. Internal loops (a short series of unpaired bases in a longer paired helix) and bulges (regions in which one strand of a helix has "extra" inserted bases with no counterparts in the opposite strand) are also frequent. There are many secondary structure elements of functional importance to biological RNA's; some famous examples are the Rho-independent terminator stem-loops and the tRNA cloverleaf. Active research is on-going to determine the secondary structure of RNA molecules, with approaches including both
experimental An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into cause-and-effect by demonstrating what outcome occurs when a ...
and
computational Computation is any type of arithmetic or non-arithmetic calculation that follows a well-defined model (e.g., an algorithm). Mechanical or electronic devices (or, historically, people) that perform computations are known as ''computers''. An espe ...
methods (see also the
List of RNA structure prediction software This list of RNA structure prediction software is a compilation of software tools and web portals used for RNA structure prediction. Single sequence secondary structure prediction. Single sequence tertiary structure prediction Comparative me ...
).


Pseudoknots

A pseudoknot is a nucleic acid secondary structure containing at least two stem-loop structures in which half of one stem is intercalated between the two halves of another stem. Pseudoknots fold into knot-shaped three-dimensional conformations but are not true topological knots. The base pairing in pseudoknots is not well nested; that is, base pairs occur that "overlap" one another in sequence position. This makes the presence of general pseudoknots in nucleic acid sequences impossible to predict by the standard method of
dynamic programming Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. ...
, which uses a recursive scoring system to identify paired stems and consequently cannot detect non-nested base pairs with common algorithms. However, limited subclasses of pseudoknots can be predicted using modified dynamic programs. Newer structure prediction techniques such as stochastic context-free grammars are also unable to consider pseudoknots. Pseudoknots can form a variety of structures with catalytic activity and several important biological processes rely on RNA molecules that form pseudoknots. For example, the RNA component of the human
telomerase Telomerase, also called terminal transferase, is a ribonucleoprotein that adds a species-dependent telomere repeat sequence to the 3' end of telomeres. A telomere is a region of repetitive sequences at each end of the chromosomes of most euka ...
contains a pseudoknot that is critical for its activity. The hepatitis delta virus ribozyme is a well known example of a catalytic RNA with a pseudoknot in its active site. Though DNA can also form pseudoknots, they are generally not present in standard physiological conditions.


Secondary structure prediction

Most methods for nucleic acid secondary structure prediction rely on a nearest neighbor thermodynamic model. A common method to determine the most probable structures given a sequence of
nucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecule ...
s makes use of a
dynamic programming Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. ...
algorithm that seeks to find structures with low free energy. Dynamic programming algorithms often forbid pseudoknots, or other cases in which base pairs are not fully nested, as considering these structures becomes computationally very expensive for even small nucleic acid molecules. Other methods, such as stochastic context-free grammars can also be used to predict nucleic acid secondary structure. For many RNA molecules, the secondary structure is highly important to the correct function of the RNA — often more so than the actual sequence. This fact aids in the analysis of
non-coding RNA A non-coding RNA (ncRNA) is a functional RNA molecule that is not Translation (genetics), translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally im ...
sometimes termed "RNA genes". One application of bioinformatics uses predicted RNA secondary structures in searching a
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding g ...
for noncoding but functional forms of RNA. For example,
microRNA MicroRNA (miRNA) are small, single-stranded, non-coding RNA molecules containing 21 to 23 nucleotides. Found in plants, animals and some viruses, miRNAs are involved in RNA silencing and post-transcriptional regulation of gene expression. mi ...
s have canonical long stem-loop structures interrupted by small internal loops. RNA secondary structure applies in RNA splicing in certain species. In humans and other tetrapods, it has been shown that without the
U2AF2 Splicing factor U2AF 65 kDa subunit is a protein that in humans is encoded by the ''U2AF2'' gene. Function In eukaryotes, the introns in the transcribed pre-mRNA first have to be removed by spliceosome in order to form a mature mRNA. A spl ...
protein, the splicing process is inhibited. However, in
zebrafish The zebrafish (''Danio rerio'') is a freshwater fish belonging to the minnow family (Cyprinidae) of the order Cypriniformes. Native to South Asia, it is a popular aquarium fish, frequently sold under the trade name zebra danio (and thus often ca ...
and other
teleosts Teleostei (; Greek ''teleios'' "complete" + ''osteon'' "bone"), members of which are known as teleosts ), is, by far, the largest infraclass in the class Actinopterygii, the ray-finned fishes, containing 96% of all extant species of fish. Teleo ...
the RNA splicing process can still occur on certain genes in the absence of U2AF2. This may be because 10% of genes in zebrafish have alternating TG and AC base pairs at the 3' splice site (3'ss) and 5' splice site (5'ss) respectively on each intron, which alters the secondary structure of the RNA. This suggests that secondary structure of RNA can influence splicing, potentially without the use of proteins like U2AF2 that have been thought to be required for splicing to occur.


Secondary structure determination

RNA secondary structure can be determined from atomic coordinates (tertiary structure) obtained by
X-ray crystallography X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles ...
, often deposited in the
Protein Data Bank The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, ...
. Current methods include 3DNA/DSSR and MC-annotate.


See also

*
DNA nanotechnology DNA nanotechnology is the design and manufacture of artificial nucleic acid structures for technological uses. In this field, nucleic acids are used as non-biological engineering materials for nanotechnology rather than as the carriers of geneti ...
* Molecular models of DNA * DiProDB. The database is designed to collect and analyse thermodynamic, structural and other dinucleotide properties. * RNA CoSSMos


References


External links


MDDNA: Structural Bioinformatics of DNA
— Commercial software for DNA modeling
DNAlive: a web interface to compute DNA physical properties
Also allows cross-linking of the results with the UCSC
Genome browser In bioinformatics, a genome browser is a graphical interface for display of information from a biological database Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughp ...
and DNA dynamics. {{DEFAULTSORT:Nucleic Acid Secondary Structure DNA Biophysics Molecular structure RNA