
A base pair (bp) is a fundamental unit of double-stranded
nucleic acids
Nucleic acids are large biomolecules that are crucial in all cells and viruses. They are composed of nucleotides, which are the monomer components: a 5-carbon sugar, a phosphate group and a nitrogenous base. The two main classes of nucleic a ...
consisting of two
nucleobases
Nucleotide bases (also nucleobases, nitrogenous bases) are nitrogen-containing biological compounds that form nucleosides, which, in turn, are components of nucleotides, with all of these monomers constituting the basic building blocks of nuc ...
bound to each other by
hydrogen bond
In chemistry, a hydrogen bond (H-bond) is a specific type of molecular interaction that exhibits partial covalent character and cannot be described as a purely electrostatic force. It occurs when a hydrogen (H) atom, Covalent bond, covalently b ...
s. They form the building blocks of the
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
double helix and contribute to the folded structure of both DNA and
RNA
Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
. Dictated by specific
hydrogen bond
In chemistry, a hydrogen bond (H-bond) is a specific type of molecular interaction that exhibits partial covalent character and cannot be described as a purely electrostatic force. It occurs when a hydrogen (H) atom, Covalent bond, covalently b ...
ing patterns, "Watson–Crick" (or "Watson–Crick–Franklin") base pairs (
guanine
Guanine () (symbol G or Gua) is one of the four main nucleotide bases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine ( uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside ...
–
cytosine
Cytosine () (symbol C or Cyt) is one of the four nucleotide bases found in DNA and RNA, along with adenine, guanine, and thymine ( uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attac ...
and
adenine
Adenine (, ) (nucleoside#List of nucleosides and corresponding nucleobases, symbol A or Ade) is a purine nucleotide base that is found in DNA, RNA, and Adenosine triphosphate, ATP. Usually a white crystalline subtance. The shape of adenine is ...
–
thymine
Thymine () (symbol T or Thy) is one of the four nucleotide bases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine ...
) allow the DNA helix to maintain a regular helical structure that is subtly dependent on its
nucleotide sequence
A nucleic acid sequence is a succession of bases within the nucleotides forming alleles within a DNA (using GACT) or RNA (GACU) molecule. This succession is denoted by a series of a set of five different letters that indicate the order of the nu ...
. The
complementary nature of this based-paired structure provides a
redundant copy of the
genetic information
A nucleic acid sequence is a succession of Nucleobase, bases within the nucleotides forming alleles within a DNA (using GACT) or RNA (GACU) molecule. This succession is denoted by a series of a set of five different letters that indicate the orde ...
encoded within each strand of DNA. The regular structure and data redundancy provided by the DNA double helix make DNA well suited to the storage of genetic information, while base-pairing between DNA and incoming nucleotides provides the mechanism through which
DNA polymerase
A DNA polymerase is a member of a family of enzymes that catalyze the synthesis of DNA molecules from nucleoside triphosphates, the molecular precursors of DNA. These enzymes are essential for DNA replication and usually work in groups to create t ...
replicates DNA and
RNA polymerase
In molecular biology, RNA polymerase (abbreviated RNAP or RNApol), or more specifically DNA-directed/dependent RNA polymerase (DdRP), is an enzyme that catalyzes the chemical reactions that synthesize RNA from a DNA template.
Using the e ...
transcribes DNA into RNA. Many DNA-binding proteins can recognize specific base-pairing patterns that identify particular regulatory regions of genes.
Intramolecular base pairs can occur within single-stranded nucleic acids. This is particularly important in RNA molecules (e.g.,
transfer RNA
Transfer ribonucleic acid (tRNA), formerly referred to as soluble ribonucleic acid (sRNA), is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes). In a cell, it provides the physical link between the gene ...
), where Watson–Crick base pairs (guanine–cytosine and adenine–
uracil
Uracil () (nucleoside#List of nucleosides and corresponding nucleobases, symbol U or Ura) is one of the four nucleotide bases in the nucleic acid RNA. The others are adenine (A), cytosine (C), and guanine (G). In RNA, uracil binds to adenine via ...
) permit the formation of short double-stranded helices, and a wide variety of non–Watson–Crick interactions (e.g., G–U or A–A) allow RNAs to fold into a vast range of specific three-dimensional
structures
A structure is an arrangement and organization of interrelated elements in a material object or system, or the object or system so organized. Material structures include man-made objects such as buildings and machines and natural objects such as ...
. In addition, base-pairing between
transfer RNA
Transfer ribonucleic acid (tRNA), formerly referred to as soluble ribonucleic acid (sRNA), is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes). In a cell, it provides the physical link between the gene ...
(tRNA) and
messenger RNA
In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein.
mRNA is created during the ...
(mRNA) forms the basis for the
molecular recognition
Supramolecular chemistry refers to the branch of chemistry concerning Chemical species, chemical systems composed of a integer, discrete number of molecules. The strength of the forces responsible for spatial organization of the system range from w ...
events that result in the nucleotide sequence of mRNA becoming
translated into the amino acid sequence of
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
s via the
genetic code
Genetic code is a set of rules used by living cell (biology), cells to Translation (biology), translate information encoded within genetic material (DNA or RNA sequences of nucleotide triplets or codons) into proteins. Translation is accomplished ...
.
The size of an individual
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
or an organism's entire
genome
A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
is often measured in base pairs because DNA is usually double-stranded. Hence, the number of total base pairs is equal to the number of nucleotides in one of the strands (with the exception of non-coding single-stranded regions of
telomere
A telomere (; ) is a region of repetitive nucleotide sequences associated with specialized proteins at the ends of linear chromosomes (see #Sequences, Sequences). Telomeres are a widespread genetic feature most commonly found in eukaryotes. In ...
s). The
haploid
Ploidy () is the number of complete sets of chromosomes in a cell (biology), cell, and hence the number of possible alleles for Autosome, autosomal and Pseudoautosomal region, pseudoautosomal genes. Here ''sets of chromosomes'' refers to the num ...
human genome
The human genome is a complete set of nucleic acid sequences for humans, encoded as the DNA within each of the 23 distinct chromosomes in the cell nucleus. A small DNA molecule is found within individual Mitochondrial DNA, mitochondria. These ar ...
(23
chromosome
A chromosome is a package of DNA containing part or all of the genetic material of an organism. In most chromosomes, the very long thin DNA fibers are coated with nucleosome-forming packaging proteins; in eukaryotic cells, the most import ...
s) is estimated to be about 3.2 billion base pairs long and to contain 20,000–25,000 distinct protein-coding genes.
A
kilobase
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
(kb) is a unit of measurement in
molecular biology
Molecular biology is a branch of biology that seeks to understand the molecule, molecular basis of biological activity in and between Cell (biology), cells, including biomolecule, biomolecular synthesis, modification, mechanisms, and interactio ...
equal to 1000 base pairs of DNA or RNA. The total number of
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
base pairs on Earth is estimated at 5.0 with a weight of 50 billion
tonne
The tonne ( or ; symbol: t) is a unit of mass equal to 1,000 kilograms. It is a non-SI unit accepted for use with SI. It is also referred to as a metric ton in the United States to distinguish it from the non-metric units of the s ...
s.
In comparison, the total
mass
Mass is an Intrinsic and extrinsic properties, intrinsic property of a physical body, body. It was traditionally believed to be related to the physical quantity, quantity of matter in a body, until the discovery of the atom and particle physi ...
of the
biosphere
The biosphere (), also called the ecosphere (), is the worldwide sum of all ecosystems. It can also be termed the zone of life on the Earth. The biosphere (which is technically a spherical shell) is virtually a closed system with regard to mat ...
has been estimated to be as much as 4
TtC (trillion tons of
carbon
Carbon () is a chemical element; it has chemical symbol, symbol C and atomic number 6. It is nonmetallic and tetravalence, tetravalent—meaning that its atoms are able to form up to four covalent bonds due to its valence shell exhibiting 4 ...
).
Hydrogen bonding and stability
Top, a G.C base pair with three
hydrogen bond
In chemistry, a hydrogen bond (H-bond) is a specific type of molecular interaction that exhibits partial covalent character and cannot be described as a purely electrostatic force. It occurs when a hydrogen (H) atom, Covalent bond, covalently b ...
s. Bottom, an A.T base pair with two hydrogen bonds. Non-covalent hydrogen bonds between the bases are shown as dashed lines. The wiggly lines stand for the connection to the pentose sugar and point in the direction of the minor groove.
Hydrogen bond
In chemistry, a hydrogen bond (H-bond) is a specific type of molecular interaction that exhibits partial covalent character and cannot be described as a purely electrostatic force. It occurs when a hydrogen (H) atom, Covalent bond, covalently b ...
ing is the chemical interaction that underlies the base-pairing rules described above. Appropriate geometrical correspondence of hydrogen bond donors and acceptors allows only the "right" pairs to form stably. DNA with high
GC-content
In molecular biology and genetics, GC-content (or guanine-cytosine content) is the percentage of nitrogenous bases in a DNA or RNA molecule that are either guanine (G) or cytosine (C). This measure indicates the proportion of G and C bases out of ...
is more stable than DNA with low GC-content. Crucially, however,
stacking interactions are primarily responsible for stabilising the double-helical structure; Watson-Crick base pairing's contribution to global structural stability is minimal, but its role in the specificity underlying complementarity is, by contrast, of maximal importance as this underlies the template-dependent processes of the
central dogma (e.g.
DNA replication
In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all life, living organisms, acting as the most essential part of heredity, biolog ...
).
The bigger
nucleobase
Nucleotide bases (also nucleobases, nitrogenous bases) are nitrogen-containing biological compounds that form nucleosides, which, in turn, are components of nucleotides, with all of these monomers constituting the basic building blocks of nuc ...
s, adenine and guanine, are members of a class of double-ringed chemical structures called
purine
Purine is a heterocyclic aromatic organic compound that consists of two rings (pyrimidine and imidazole) fused together. It is water-soluble. Purine also gives its name to the wider class of molecules, purines, which include substituted puri ...
s; the smaller nucleobases, cytosine and thymine (and uracil), are members of a class of single-ringed chemical structures called
pyrimidine
Pyrimidine (; ) is an aromatic, heterocyclic, organic compound similar to pyridine (). One of the three diazines (six-membered heterocyclics with two nitrogen atoms in the ring), it has nitrogen atoms at positions 1 and 3 in the ring. The oth ...
s. Purines are complementary only with pyrimidines: pyrimidine–pyrimidine pairings are energetically unfavorable because the molecules are too far apart for hydrogen bonding to be established; purine–purine pairings are energetically unfavorable because the molecules are too close, leading to overlap repulsion. Purine–pyrimidine base-pairing of AT or GC or UA (in RNA) results in proper duplex structure. The only other purine–pyrimidine pairings would be AC and GT and UG (in RNA); these pairings are mismatches because the patterns of hydrogen donors and acceptors do not correspond. The GU pairing, with two hydrogen bonds, does occur fairly often in
RNA
Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
(see
wobble base pair
A wobble base pair is a pairing between two nucleotides in RNA molecules that does not follow Watson-Crick base pair rules. The four main wobble base pairs are guanine-uracil (G-U), hypoxanthine-uracil (I-U), hypoxanthine-adenine (I-A), and hypo ...
).
Paired DNA and RNA molecules are comparatively stable at room temperature, but the two nucleotide strands will separate above a
melting point
The melting point (or, rarely, liquefaction point) of a substance is the temperature at which it changes state of matter, state from solid to liquid. At the melting point the solid and liquid phase (matter), phase exist in Thermodynamic equilib ...
that is determined by the length of the molecules, the extent of mispairing (if any), and the GC content. Higher GC content results in higher melting temperatures; it is, therefore, unsurprising that the genomes of
extremophile
An extremophile () is an organism that is able to live (or in some cases thrive) in extreme environments, i.e., environments with conditions approaching or stretching the limits of what known life can adapt to, such as extreme temperature, press ...
organisms such as ''
Thermus thermophilus
''Thermus thermophilus'' is a gram stain, Gram-negative bacterium used in a range of biotechnological applications, including as a model organism for genetic manipulation, structural genomics, and systems biology. The bacterium is extremely therm ...
'' are particularly GC-rich. On the converse, regions of a genome that need to separate frequently — for example, the promoter regions for often-
transcribed genes — are comparatively GC-poor (for example, see
TATA box
In molecular biology, the TATA box (also called the Goldberg–Hogness box) is a sequence of DNA found in the core promoter region of genes in archaea and eukaryotes. The bacterial homolog of the TATA box is called the Pribnow box which has a ...
). GC content and melting temperature must also be taken into account when designing
primers for
PCR reactions.
Examples
The following DNA sequences illustrate pair double-stranded patterns. By convention, the top strand is written from the
5′-end to the
3′-end; thus, the bottom strand (complementary strand) is written 3′ to 5′.
:A base-paired DNA sequence:
::
::
:The corresponding RNA sequence, in which
uracil
Uracil () (nucleoside#List of nucleosides and corresponding nucleobases, symbol U or Ura) is one of the four nucleotide bases in the nucleic acid RNA. The others are adenine (A), cytosine (C), and guanine (G). In RNA, uracil binds to adenine via ...
is substituted for thymine in the RNA strand:
::
::
Base analogs and intercalators
Chemical analogs of nucleotides can take the place of proper nucleotides and establish non-canonical base-pairing, leading to errors (mostly
point mutation
A point mutation is a genetic mutation where a single nucleotide base is changed, inserted or deleted from a DNA or RNA sequence of an organism's genome. Point mutations have a variety of effects on the downstream protein product—consequences ...
s) in
DNA replication
In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all life, living organisms, acting as the most essential part of heredity, biolog ...
and
DNA transcription. This is due to their
isosteric chemistry. One common mutagenic base analog is
5-bromouracil, which resembles thymine but can base-pair to guanine in its
enol
In organic chemistry, enols are a type of functional group or intermediate in organic chemistry containing a group with the formula (R = many substituents). The term ''enol'' is an abbreviation of ''alkenol'', a portmanteau deriving from "-ene ...
form.
Other chemicals, known as
DNA intercalators, fit into the gap between adjacent bases on a single strand and induce
frameshift mutation
A frameshift mutation (also called a framing error or a reading frame shift) is a genetic mutation caused by indels ( insertions or deletions) of a number of nucleotides in a DNA sequence that is not divisible by three. Due to the triplet natur ...
s by "masquerading" as a base, causing the DNA replication machinery to skip or insert additional nucleotides at the intercalated site. Most intercalators are large
polyaromatic
A Polycyclic aromatic hydrocarbon (PAH) is any member of a class of organic compounds that is composed of multiple fused aromatic rings. Most are produced by the incomplete combustion of organic matter— by engine exhaust fumes, tobacco, incine ...
compounds and are known or suspected
carcinogen
A carcinogen () is any agent that promotes the development of cancer. Carcinogens can include synthetic chemicals, naturally occurring substances, physical agents such as ionizing and non-ionizing radiation, and biologic agents such as viruse ...
s. Examples include
ethidium bromide
Ethidium bromide (or homidium bromide, chloride salt homidium chloride) is an intercalating agent commonly used as a fluorescent tag (nucleic acid stain) in molecular biology laboratories for techniques such as agarose gel electrophoresis. It ...
and
acridine
Acridine is an organic compound and a nitrogen heterocycle with the formula C13H9N. Acridines are substituted derivatives of the parent ring. It is a planar molecule that is structurally related to anthracene with one of the central CH groups ...
.
Mismatch repair
Mismatched base pairs can be generated by errors of
DNA replication
In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all life, living organisms, acting as the most essential part of heredity, biolog ...
and as intermediates during
homologous recombination
Homologous recombination is a type of genetic recombination in which genetic information is exchanged between two similar or identical molecules of double-stranded or single-stranded nucleic acids (usually DNA as in Cell (biology), cellular organi ...
. The process of mismatch repair ordinarily must recognize and correctly repair a small number of base mispairs within a long sequence of normal DNA base pairs. To repair mismatches formed during DNA replication, several distinctive repair processes have evolved to distinguish between the template strand and the newly formed strand so that only the newly inserted incorrect nucleotide is removed (in order to avoid generating a mutation). The proteins employed in mismatch repair during DNA replication, and the clinical significance of defects in this process are described in the article
DNA mismatch repair
DNA mismatch repair (MMR) is a system for recognizing and repairing erroneous insertion, deletion, and mis-incorporation of nucleobase, bases that can arise during DNA replication and Genetic recombination, recombination, as well as DNA repair, ...
. The process of mispair correction during recombination is described in the article
gene conversion
Gene conversion is the process by which one DNA sequence replaces a homologous sequence such that the sequences become identical after the conversion. Gene conversion can be either allelic, meaning that one allele of the same gene replaces another ...
.
Length measurements

The following abbreviations are commonly used to describe the length of a D/R
NA molecule:
* bp = base pair—one bp corresponds to approximately 3.4
Å (340
pm) of length along the strand, and to roughly 618 or 643
daltons for DNA and RNA respectively.
* kb (= kbp) = kilo–base-pair = 1,000 bp
* Mb (= Mbp) = mega–base-pair = 1,000,000 bp
* Gb (= Gbp) = giga–base-pair = 1,000,000,000 bp
For single-stranded DNA/RNA, units of
nucleotide
Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
s are used—abbreviated nt (or knt, Mnt, Gnt)—as they are not paired.
To distinguish between units of
computer storage
Computer data storage or digital data storage is a technology consisting of computer components and Data storage, recording media that are used to retain digital data. It is a core function and fundamental component of computers.
The cent ...
and bases, kbp, Mbp, Gbp, etc. may be used for base pairs.
The
centimorgan
In genetics, a centimorgan (abbreviated cM) or map unit (m.u.) is a unit for measuring genetic linkage. It is defined as the distance between chromosome positions (also termed loci or markers) for which the expected average number of intervening ...
is also often used to imply distance along a chromosome, but the number of base pairs it corresponds to varies widely. In the human genome, the centimorgan is about 1 million base pairs.
Unnatural base pair (UBP)
An unnatural base pair (UBP) is a designed subunit (or
nucleobase
Nucleotide bases (also nucleobases, nitrogenous bases) are nitrogen-containing biological compounds that form nucleosides, which, in turn, are components of nucleotides, with all of these monomers constituting the basic building blocks of nuc ...
) of
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
which is created in a laboratory and does not occur in nature. DNA sequences have been described which use newly created nucleobases to form a third base pair, in addition to the two base pairs found in nature, A-T (
adenine
Adenine (, ) (nucleoside#List of nucleosides and corresponding nucleobases, symbol A or Ade) is a purine nucleotide base that is found in DNA, RNA, and Adenosine triphosphate, ATP. Usually a white crystalline subtance. The shape of adenine is ...
–
thymine
Thymine () (symbol T or Thy) is one of the four nucleotide bases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine ...
) and G-C (
guanine
Guanine () (symbol G or Gua) is one of the four main nucleotide bases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine ( uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside ...
–
cytosine
Cytosine () (symbol C or Cyt) is one of the four nucleotide bases found in DNA and RNA, along with adenine, guanine, and thymine ( uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attac ...
). A few research groups have been searching for a third base pair for DNA, including teams led by
Steven A. Benner,
Philippe Marliere,
Floyd E. Romesberg and
Ichiro Hirao.
Some new base pairs based on alternative hydrogen bonding, hydrophobic interactions and metal coordination have been reported.
In 1989 Steven Benner (then working at the
Swiss Federal Institute of Technology in Zurich) and his team led with modified forms of cytosine and guanine into DNA molecules ''in vitro''. The nucleotides, which encoded RNA and proteins, were successfully replicated ''in vitro''. Since then, Benner's team has been trying to engineer cells that can make foreign bases from scratch, obviating the need for a feedstock.
In 2002, Ichiro Hirao's group in Japan developed an unnatural base pair between 2-amino-8-(2-thienyl)purine (s) and pyridine-2-one (y) that functions in transcription and translation, for the site-specific incorporation of non-standard amino acids into proteins. In 2006, they created 7-(2-thienyl)imidazo
,5-byridine (Ds) and pyrrole-2-carbaldehyde (Pa) as a third base pair for replication and transcription. Afterward, Ds and 4-
-(6-aminohexanamido)-1-propynyl2-nitropyrrole (Px) was discovered as a high fidelity pair in PCR amplification.
In 2013, they applied the Ds-Px pair to DNA aptamer generation by ''in vitro'' selection (SELEX) and demonstrated the genetic alphabet expansion significantly augment DNA aptamer affinities to target proteins.
In 2012, a group of American scientists led by Floyd Romesberg, a chemical biologist at the
Scripps Research Institute in San Diego, California, published that his team designed an unnatural base pair (UBP).
The two new artificial nucleotides or ''Unnatural Base Pair'' (UBP) were named
d5SICS and
dNaM. More technically, these artificial
nucleotide
Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
s bearing hydrophobic
nucleobase
Nucleotide bases (also nucleobases, nitrogenous bases) are nitrogen-containing biological compounds that form nucleosides, which, in turn, are components of nucleotides, with all of these monomers constituting the basic building blocks of nuc ...
s, feature two fused
aromatic rings that form a (d5SICS–dNaM) complex or base pair in DNA.
His team designed a variety of ''in vitro'' or "test tube" templates containing the unnatural base pair and they confirmed that it was efficiently replicated with high fidelity in virtually all sequence contexts using the modern standard ''in vitro'' techniques, namely
PCR amplification of DNA and PCR-based applications.
Their results show that for PCR and PCR-based applications, the d5SICS–dNaM unnatural base pair is functionally equivalent to a natural base pair, and when combined with the other two natural base pairs used by all organisms, A–T and G–C, they provide a fully functional and expanded six-letter "genetic alphabet".
In 2014 the same team from the Scripps Research Institute reported that they synthesized a stretch of circular DNA known as a
plasmid
A plasmid is a small, extrachromosomal DNA molecule within a cell that is physically separated from chromosomal DNA and can replicate independently. They are most commonly found as small circular, double-stranded DNA molecules in bacteria and ...
containing natural T-A and C-G base pairs along with the best-performing UBP Romesberg's laboratory had designed and inserted it into cells of the common bacterium ''
E. coli
''Escherichia coli'' ( )Wells, J. C. (2000) Longman Pronunciation Dictionary. Harlow ngland Pearson Education Ltd. is a gram-negative, facultative anaerobic, rod-shaped, coliform bacterium of the genus ''Escherichia'' that is commonly foun ...
'' that successfully replicated the unnatural base pairs through multiple generations.
The
transfection
Transfection is the process of deliberately introducing naked or purified nucleic acids into eukaryotic cells. It may also refer to other methods and cell types, although other terms are often preferred: " transformation" is typically used to des ...
did not hamper the growth of the ''E. coli'' cells and showed no sign of losing its unnatural base pairs to its natural
DNA repair
DNA repair is a collection of processes by which a cell (biology), cell identifies and corrects damage to the DNA molecules that encode its genome. A weakened capacity for DNA repair is a risk factor for the development of cancer. DNA is cons ...
mechanisms. This is the first known example of a living organism passing along an expanded genetic code to subsequent generations.
Romesberg said he and his colleagues created 300 variants to refine the design of nucleotides that would be stable enough and would be replicated as easily as the natural ones when the cells divide. This was in part achieved by the addition of a supportive
algal gene that expresses a
nucleotide triphosphate transporter which efficiently imports the triphosphates of both d5SICSTP and dNaMTP into ''E. coli'' bacteria.
Then, the natural bacterial replication pathways use them to accurately replicate a
plasmid
A plasmid is a small, extrachromosomal DNA molecule within a cell that is physically separated from chromosomal DNA and can replicate independently. They are most commonly found as small circular, double-stranded DNA molecules in bacteria and ...
containing d5SICS–dNaM. Other researchers were surprised that the bacteria replicated these human-made DNA subunits.
The successful incorporation of a third base pair is a significant breakthrough toward the goal of greatly expanding the number of
amino acid
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although over 500 amino acids exist in nature, by far the most important are the 22 α-amino acids incorporated into proteins. Only these 22 a ...
s which can be encoded by DNA, from the existing 20 amino acids to a theoretically possible 172, thereby expanding the potential for living organisms to produce novel
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
s.
The artificial strings of DNA do not encode for anything yet, but scientists speculate they could be designed to manufacture new proteins which could have industrial or pharmaceutical uses.
Experts said the synthetic DNA incorporating the unnatural base pair raises the possibility of life forms based on a different DNA code.
Non-canonical base pairing
In addition to the canonical pairing, some conditions can also favour base-pairing with alternative base orientation, and number and geometry of hydrogen bonds. These pairings are accompanied by alterations to the local backbone shape.
The most common of these is the
wobble base pair
A wobble base pair is a pairing between two nucleotides in RNA molecules that does not follow Watson-Crick base pair rules. The four main wobble base pairs are guanine-uracil (G-U), hypoxanthine-uracil (I-U), hypoxanthine-adenine (I-A), and hypo ...
ing that occurs between
tRNA
Transfer ribonucleic acid (tRNA), formerly referred to as soluble ribonucleic acid (sRNA), is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes). In a cell, it provides the physical link between the gene ...
s and
mRNA
In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein.
mRNA is ...
s at the third base position of many
codon
Genetic code is a set of rules used by living cells to translate information encoded within genetic material (DNA or RNA sequences of nucleotide triplets or codons) into proteins. Translation is accomplished by the ribosome, which links prote ...
s during
transcription and during the charging of tRNAs by some
tRNA synthetases. They have also been observed in the secondary structures of some RNA sequences.
Additionally,
Hoogsteen base pair
A Hoogsteen base pair is a variation of base-pairing in nucleic acids such as the A•T pair. In this manner, two nucleobases, one on each strand, can be held together by hydrogen bonds in the major groove. A Hoogsteen base pair applies the N7 po ...
ing (typically written as A•U/T and G•C) can exist in some DNA sequences (e.g. CA and TA dinucleotides) in dynamic equilibrium with standard Watson–Crick pairing.
They have also been observed in some protein–DNA complexes.
In addition to these alternative base pairings, a wide range of base-base hydrogen bonding is observed in RNA secondary and tertiary structure.
These bonds are often necessary for the precise, complex shape of an RNA, as well as its binding to interaction partners.
See also
*
List of Y-DNA single-nucleotide polymorphisms
*
Non-canonical base pairing Non-canonical base pairs are planar hydrogen bonded pairs of nucleobases, having hydrogen bonding patterns which differ from the patterns observed in Watson-Crick base pairs, as in the classic double helical DNA. The structures of polynucleotide s ...
*
Chargaff's rules
Chargaff's rules (given by Erwin Chargaff) state that in the DNA of any species and any organism, the amount of guanine should be equal to the amount of cytosine and the amount of adenine should be equal to the amount of thymine. Further, a 1:1 st ...
References
Further reading
* (See esp. ch. 6 and 9)
*
*
*
External links
DAN��webserver version of the
EMBOSS tool for calculating melting temperatures
{{DEFAULTSORT:Base Pair
Nucleobases
Molecular genetics
Nucleic acids