Nucleic acid design
   HOME

TheInfoList



OR:

Nucleic acid design is the process of generating a set of nucleic acid base sequences that will associate into a desired conformation. Nucleic acid design is central to the fields of
DNA nanotechnology DNA nanotechnology is the design and manufacture of artificial nucleic acid structures for technological uses. In this field, nucleic acids are used as non-biological engineering materials for nanotechnology rather than as the carriers of geneti ...
and
DNA computing DNA computing is an emerging branch of unconventional computing which uses DNA, biochemistry, and molecular biology hardware, instead of the traditional electronic computing. Research and development in this area concerns theory, experiments, a ...
. It is necessary because there are many possible
sequences In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is called t ...
of nucleic acid strands that will fold into a given secondary structure, but many of these sequences will have undesired additional interactions which must be avoided. In addition, there are many
tertiary structure Protein tertiary structure is the three dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may i ...
considerations which affect the choice of a secondary structure for a given design. Nucleic acid design has similar goals to
protein design Protein design is the rational design of new protein molecules to design novel activity, behavior, or purpose, and to advance basic understanding of protein function. Proteins can be designed from scratch (''de novo'' design) or by making calcul ...
: in both, the sequence of monomers is rationally designed to favor the desired folded or associated structure and to disfavor alternate structures. However, nucleic acid design has the advantage of being a much computationally simpler problem, since the simplicity of Watson-Crick base pairing rules leads to simple
heuristic A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate ...
methods which yield experimentally robust designs. Computational models for
protein folding Protein folding is the physical process by which a protein chain is translated to its native three-dimensional structure, typically a "folded" conformation by which the protein becomes biologically functional. Via an expeditious and reproduc ...
require
tertiary structure Protein tertiary structure is the three dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may i ...
information whereas nucleic acid design can operate largely on the level of secondary structure. However, nucleic acid structures are less versatile than proteins in their functionality. Nucleic acid design can be considered the inverse of
nucleic acid structure prediction Nucleic acid structure prediction is a computational method to determine ''secondary'' and ''tertiary'' nucleic acid structure from its sequence. Secondary structure can be predicted from one or several nucleic acid sequences. Tertiary structure ...
. In structure prediction, the structure is determined from a known sequence, while in nucleic acid design, a sequence is generated which will form a desired structure.


Fundamental concepts

The structure of nucleic acids consists of a sequence of
nucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecule ...
s. There are four types of nucleotides distinguished by which of the four
nucleobase Nucleobases, also known as ''nitrogenous bases'' or often simply ''bases'', are nitrogen-containing biological compounds that form nucleosides, which, in turn, are components of nucleotides, with all of these monomers constituting the basic b ...
s they contain: in DNA these are
adenine Adenine () ( symbol A or Ade) is a nucleobase (a purine derivative). It is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The three others are guanine, cytosine and thymine. Its deri ...
(A),
cytosine Cytosine () ( symbol C or Cyt) is one of the four nucleobases found in DNA and RNA, along with adenine, guanine, and thymine (uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached (an ...
(C),
guanine Guanine () ( symbol G or Gua) is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine (uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside is c ...
(G), and
thymine Thymine () ( symbol T or Thy) is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidi ...
(T). Nucleic acids have the property that two molecules will bind to each other to form a
double helix A double is a look-alike or doppelgänger; one person or being that resembles another. Double, The Double or Dubble may also refer to: Film and television * Double (filmmaking), someone who substitutes for the credited actor of a character * ...
only if the two sequences are complementary, that is, they can form matching sequences of base pairs. Thus, in nucleic acids the sequence determines the pattern of binding and thus the overall structure. Nucleic acid design is the process by which, given a desired target structure or functionality, sequences are generated for nucleic acid strands which will self-assemble into that target structure. Nucleic acid design encompasses all levels of
nucleic acid structure Nucleic acid structure refers to the structure of nucleic acids such as DNA and RNA. Chemically speaking, DNA and RNA are very similar. Nucleic acid structure is often divided into four different levels: primary, secondary, tertiary, and quater ...
: * Primary structure—the raw sequence of
nucleobase Nucleobases, also known as ''nitrogenous bases'' or often simply ''bases'', are nitrogen-containing biological compounds that form nucleosides, which, in turn, are components of nucleotides, with all of these monomers constituting the basic b ...
s of each of the component nucleic acid strands; * Secondary structure—the set of interactions between bases, i.e., which parts of which strands are bound to each other; and *
Tertiary structure Protein tertiary structure is the three dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may i ...
—the locations of the atoms in three-dimensional space, taking into consideration geometrical and steric constraints. One of the greatest concerns in nucleic acid design is ensuring that the target structure has the lowest free energy (i.e. is the most
thermodynamic Thermodynamics is a branch of physics that deals with heat, work, and temperature, and their relation to energy, entropy, and the physical properties of matter and radiation. The behavior of these quantities is governed by the four laws of the ...
ally favorable) whereas misformed structures have higher values of free energy and are thus unfavored. These goals can be achieved through the use of a number of approaches, including
heuristic A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate ...
, thermodynamic, and geometrical ones. Almost all nucleic acid design tasks are aided by computers, and a number of software packages are available for many of these tasks. Two considerations in nucleic acid design are that desired hybridizations should have melting temperatures in a narrow range, and any spurious interactions should have very low melting temperatures (i.e. they should be very weak). There is also a contrast between affinity-optimizing "positive design", seeks to minimize the energy of the desired structure in an absolute sense, and specificity-optimizing "negative design", which considers the energy of the target structure relative to those of undesired structures. Algorithms which implement both kinds of design tend to perform better than those that consider only one type.


Approaches


Heuristic methods

Heuristic A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate ...
methods use simple criteria which can be quickly evaluated to judge the suitability of different sequences for a given secondary structure. They have the advantage of being much less computationally expensive than the
energy minimization In the field of computational chemistry, energy minimization (also called energy optimization, geometry minimization, or geometry optimization) is the process of finding an arrangement in space of a collection of atoms where, according to some com ...
algorithms needed for thermodynamic or geometrical modeling, and being easier to implement, but at the cost of being less rigorous than these models. Sequence symmetry minimization is the oldest approach to nucleic acid design and was first used to design immobile versions of branched DNA structures. Sequence symmetry minimization divides the nucleic acid sequence into overlapping subsequences of a fixed length, called the criterion length. Each of the 4N possible subsequences of length N is allowed to appear only once in the sequence. This ensures that no undesired hybridizations can occur which have a length greater than or equal to the criterion length. A related heuristic approach is to consider the "mismatch distance", meaning the number of positions in a certain frame where the bases are not complementary. A greater mismatch distance lessens the chance that a strong spurious interaction can happen. This is related to the concept of
Hamming distance In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chan ...
in information theory. Another related but more involved approach is to use methods from
coding theory Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and data storage. Codes are studied ...
to construct nucleic acid sequences with desired properties.


Thermodynamic models

Information about the secondary structure of a nucleic acid complex along with its sequence can be used to predict the
thermodynamic Thermodynamics is a branch of physics that deals with heat, work, and temperature, and their relation to energy, entropy, and the physical properties of matter and radiation. The behavior of these quantities is governed by the four laws of the ...
properties of the complex. When thermodynamic models are used in nucleic acid design, there are usually two considerations: desired hybridizations should have melting temperatures in a narrow range, and any spurious interactions should have very low melting temperatures (i.e. they should be very weak). The
Gibbs free energy In thermodynamics, the Gibbs free energy (or Gibbs energy; symbol G) is a thermodynamic potential that can be used to calculate the maximum amount of work that may be performed by a thermodynamically closed system at constant temperature and ...
of a perfectly matched nucleic acid duplex can be predicted using a nearest neighbor model. This model considers only the interactions between a nucleotide and its nearest neighbors on the nucleic acid strand, by summing the free energy of each of the overlapping two-nucleotide subwords of the duplex. This is then corrected for self-complementary monomers and for GC-content. Once the free energy is known, the melting temperature of the duplex can be determined. GC-content alone can also be used to estimate the free energy and melting temperature of a nucleic acid duplex. This is less accurate but also much less computationally costly. Software for thermodynamic modeling of nucleic acids includes Nupack
mfold/UNAFold
an
Vienna
A related approach, inverse secondary structure prediction, uses stochastic local search which improves a nucleic acid sequence by running a structure prediction algorithm and the modifying the sequence to eliminate unwanted features.


Geometrical models

Geometrical models of nucleic acids are used to predict
tertiary structure Protein tertiary structure is the three dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may i ...
. This is important because designed nucleic acid complexes usually contain multiple junction points, which introduces geometric constraints to the system. These constraints stem from the basic structure of nucleic acids, mainly that the
double helix A double is a look-alike or doppelgänger; one person or being that resembles another. Double, The Double or Dubble may also refer to: Film and television * Double (filmmaking), someone who substitutes for the credited actor of a character * ...
formed by nucleic acid duplexes has a fixed helicity of about 10.4 base pairs per turn, and is relatively stiff. Because of these constraints, the nucleic acid complexes are sensitive to the relative orientation of the major and minor grooves at junction points. Geometrical modeling can detect
strain Strain may refer to: Science and technology * Strain (biology), variants of plants, viruses or bacteria; or an inbred animal used for experimental purposes * Strain (chemistry), a chemical stress of a molecule * Strain (injury), an injury to a mu ...
stemming from misalignments in the structure, which can then be corrected by the designer. Geometric models of nucleic acids for
DNA nanotechnology DNA nanotechnology is the design and manufacture of artificial nucleic acid structures for technological uses. In this field, nucleic acids are used as non-biological engineering materials for nanotechnology rather than as the carriers of geneti ...
generally use reduced representations of the nucleic acid, because simulating every atom would be very computationally expensive for such large systems. Models with three pseudo-atoms per base pair, representing the two backbone sugars and the helix axis, have been reported to have a sufficient level of detail to predict experimental results. However, models with five pseudo-atoms per base pair, explicitly including the backbone phosphates, are also used. Software for geometrical modeling of nucleic acids include
GIDEON
Nanoengineer-1
an
UNIQUIMER 3D
Geometrical concerns are especially of interest in the design of DNA origami, because the sequence is predetermined by the choice of scaffold strand. Software specifically for DNA origami design has been made, includin
caDNAno
an


Applications

Nucleic acid design is used in
DNA nanotechnology DNA nanotechnology is the design and manufacture of artificial nucleic acid structures for technological uses. In this field, nucleic acids are used as non-biological engineering materials for nanotechnology rather than as the carriers of geneti ...
to design strands which will self-assemble into a desired target structure. These include examples such as
DNA machine A DNA machine is a molecular machine constructed from DNA. Research into DNA machines was pioneered in the late 1980s by Nadrian Seeman and co-workers from New York University. DNA is used because of the numerous biological tools already found in ...
s, periodic two- and three-dimensional lattices, polyhedra, and DNA origami. It can also be used to create sets of nucleic acid strands which are "orthogonal", or non-interacting with each other, so as to minimize or eliminate spurious interactions. This is useful in
DNA computing DNA computing is an emerging branch of unconventional computing which uses DNA, biochemistry, and molecular biology hardware, instead of the traditional electronic computing. Research and development in this area concerns theory, experiments, a ...
, as well as for molecular barcoding applications in chemical biology and
biotechnology Biotechnology is the integration of natural sciences and engineering sciences in order to achieve the application of organisms, cells, parts thereof and molecular analogues for products and services. The term ''biotechnology'' was first used ...
.


See also

*
Nucleic acid analogues Nucleic acid analogues are compounds which are analogous (structurally similar) to naturally occurring RNA and DNA, used in medicine and in molecular biology research. Nucleic acids are chains of nucleotides, which are composed of three parts ...
*
Synthetic biology Synthetic biology (SynBio) is a multidisciplinary area of research that seeks to create new biological parts, devices, and systems, or to redesign systems that are already found in nature. It is a branch of science that encompasses a broad ran ...


References


Further reading

* —A review of approaches to nucleic acid primary structure design. * —A comparison and evaluation of a number of heuristic and thermodynamic methods for nucleic acid design. * —One of the earliest papers on nucleic acid design, describing the use of sequence symmetry minimization to construct immoble branched junctions. * —A review comparing the capabilities of available nucleic acid design software. {{Biomolecular structure DNA nanotechnology