In
genetics
Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinians, Augustinian ...
and
biochemistry
Biochemistry, or biological chemistry, is the study of chemical processes within and relating to living organisms. A sub-discipline of both chemistry and biology, biochemistry may be divided into three fields: structural biology, enzymology, a ...
, sequencing means to determine the
primary structure (sometimes incorrectly called the primary sequence) of an unbranched
biopolymer
Biopolymers are natural polymers produced by the cells of living organisms. Like other polymers, biopolymers consist of monomeric units that are covalently bonded in chains to form larger molecules. There are three main classes of biopolymers, ...
. Sequencing results in a symbolic linear depiction known as a sequence which succinctly summarizes much of the atomic-level structure of the sequenced molecule.
DNA sequencing
DNA sequencing is the process of determining the
nucleotide
Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
order of a given
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
fragment. So far, most DNA sequencing has been performed using the
chain termination method developed by
Frederick Sanger. This technique uses sequence-specific termination of a DNA synthesis reaction using modified nucleotide substrates. However, new sequencing technologies such as
pyrosequencing are gaining an increasing share of the sequencing market. More genome data are now being produced by pyrosequencing than Sanger DNA sequencing. Pyrosequencing has enabled rapid genome sequencing. Bacterial genomes can be sequenced in a single run with several times coverage with this technique. This technique was also used to sequence the genome of
James Watson recently.
The sequence of DNA encodes the necessary information for living things to survive and reproduce. Determining the sequence is therefore useful in fundamental research into why and how organisms live, as well as in applied subjects. Because of the key importance DNA has to living things, knowledge of DNA sequences is useful in practically any area of biological research. For example, in medicine it can be used to identify, diagnose, and potentially develop treatments for genetic diseases. Similarly, research into
pathogens may lead to treatments for contagious diseases.
Biotechnology
Biotechnology is a multidisciplinary field that involves the integration of natural sciences and Engineering Science, engineering sciences in order to achieve the application of organisms and parts thereof for products and services. Specialists ...
is a burgeoning discipline, with the potential for many useful products and services.
The Carlson curve is a term coined by ''The Economist'' to describe the biotechnological equivalent of
Moore's law
Moore's law is the observation that the Transistor count, number of transistors in an integrated circuit (IC) doubles about every two years. Moore's law is an observation and Forecasting, projection of a historical trend. Rather than a law of ...
, and is named after author Rob Carlson. Carlson accurately predicted the doubling time of DNA sequencing technologies (measured by cost and performance) would be at least as fast as Moore's law. Carlson curves illustrate the rapid (in some cases hyperexponential) decreases in cost, and increases in performance, of a variety of technologies, including DNA sequencing,
DNA synthesis, and a range of physical and computational tools used in protein expression and in determining protein structures.
Sanger sequencing
In chain terminator sequencing (Sanger sequencing), extension is initiated at a specific site on the template DNA by using a short oligonucleotide 'primer' complementary to the template at that region. The oligonucleotide primer is extended using a
DNA polymerase
A DNA polymerase is a member of a family of enzymes that catalyze the synthesis of DNA molecules from nucleoside triphosphates, the molecular precursors of DNA. These enzymes are essential for DNA replication and usually work in groups to create t ...
, an enzyme that replicates DNA. Included with the primer and DNA polymerase are the four deoxynucleotide bases (DNA building blocks), along with a low concentration of a chain terminating nucleotide (most commonly a di-deoxynucleotide). The deoxynucleotides lack in the OH group both at the 2' and at the 3' position of the ribose molecule, therefore once they are inserted within a DNA molecule they prevent it from being further elongated. In this sequencer four different vessels are employed, each containing only of the four dideoxyribonucleotides; the incorporation of the chain terminating nucleotides by the DNA polymerase in a random position results in a series of related DNA fragments, of different sizes, that terminate with a given dideoxiribonucleotide. The fragments are then size-separated by electrophoresis in a slab polyacrylamide gel, or more commonly now, in a narrow glass tube (capillary) filled with a viscous polymer.
An alternative to the labelling of the primer is to label the terminators instead, commonly called 'dye terminator sequencing'. The major advantage of this approach is the complete sequencing set can be performed in a single reaction, rather than the four needed with the labeled-primer approach. This is accomplished by labelling each of the dideoxynucleotide chain-terminators with a separate fluorescent dye, which fluoresces at a different
wavelength
In physics and mathematics, wavelength or spatial period of a wave or periodic function is the distance over which the wave's shape repeats.
In other words, it is the distance between consecutive corresponding points of the same ''phase (waves ...
. This method is easier and quicker than the dye primer approach, but may produce more uneven data peaks (different heights), due to a template dependent difference in the incorporation of the large dye chain-terminators. This problem has been significantly reduced with the introduction of new enzymes and dyes that minimize incorporation variability.
This method is now used for the vast majority of sequencing reactions as it is both simpler and cheaper. The major reason for this is that the primers do not have to be separately labelled (which can be a significant expense for a single-use custom primer), although this is less of a concern with frequently used 'universal' primers. This is changing rapidly due to the increasing cost-effectiveness of second- and third-generation systems from Illumina, 454, ABI, Helicos, and Dover.
Pyrosequencing
The pyrosequencing method is based on the detection of the pyrophosphate release on nucleotide incorporation. Before performing pyrosequencing, the DNA strand to sequence has to be amplified by PCR. Then the order in which the nucleotides have to be added in the sequencer is chosen (i.e. G-A-T-C). When a specific nucleotide is added, if the DNA polymerase incorporates it in the growing chain, the pyrophosphate is released and converted into ATP by ATP sulfurylase. ATP powers the oxidation of luciferase through the luciferase; this reaction generates a light signal recorded as a pyrogram peak. In this way, the nucleotide incorporation is correlated to a signal. The light signal is proportional to the amount of nucleotides incorporated during the synthesis of the DNA strand (i.e. two nucleotides incorporated correspond to two pyrogram peaks). When the added nucleotides aren't incorporated in the DNA molecule, no signal is recorded; the enzyme apyrase removes any unincorporated nucleotide remaining in the reaction.
This method requires neither fluorescently-labelled nucleotides nor gel electrophoresis.
Pyrosequencing, which was developed by Pål Nyrén and Mostafa Ronaghi DNA, has been commercialized by Biotage (for low-throughput sequencing) and 454 Life Sciences (for high-throughput sequencing). The latter platform sequences roughly 100
megabases
ow up to 400 megabasesin a seven-hour run with a single machine. In the array-based method (commercialized by 454 Life Sciences), single-stranded DNA is annealed to beads and amplified via
EmPCR. These DNA-bound beads are then placed into wells on a fiber-optic chip along with
enzymes
An enzyme () is a protein that acts as a biological catalyst by accelerating chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as pro ...
which produce light in the presence of
ATP. When free nucleotides are washed over this chip, light is produced as ATP is generated when nucleotides join with their complementary
base pair
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
s. Addition of one (or more) nucleotide(s) results in a reaction that generates a light signal that is recorded by the CCD camera in the instrument. The signal strength is proportional to the number of nucleotides, for example, homopolymer stretches, incorporated in a single nucleotide flow.
True single molecule sequencing
Large-scale sequencing
Whereas the methods above describe various sequencing methods, separate related terms are used when a large portion of a genome is sequenced. Several platforms were developed to perform
exome sequencing (a subset of all DNA across all chromosomes that encode genes) or
whole genome sequencing
Whole genome sequencing (WGS), also known as full genome sequencing or just genome sequencing, is the process of determining the entirety of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's ...
(sequencing of the all nuclear DNA of a human).
RNA sequencing
RNA
Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
is less stable in the cell, and also more prone to nuclease attack experimentally. As RNA is generated by
transcription from DNA, the information is already present in the cell's DNA. However, it is sometimes desirable to
sequence RNA molecules. While sequencing DNA gives a genetic profile of an organism, sequencing RNA reflects only the sequences that are actively
expressed in the cells. To sequence RNA, the usual method is first to
reverse transcribe the RNA extracted from the sample to generate cDNA fragments. This can then be sequenced as described above.
The bulk of RNA expressed in cells are
ribosomal RNA
Ribosomal ribonucleic acid (rRNA) is a type of non-coding RNA which is the primary component of ribosomes, essential to all cells. rRNA is a ribozyme which carries out protein synthesis in ribosomes. Ribosomal RNA is transcribed from ribosomal ...
s or
small RNAs, detrimental for cellular translation, but often not the focus of a study. This fraction can be removed ''in vitro'', however, to enrich for the messenger RNA, also included, that usually
is of interest. Derived from the
exon
An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequence ...
s these mRNAs are to be later
translated to
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
s that support particular cellular functions. The
expression profile therefore indicates cellular activity, particularly desired in the studies of diseases, cellular behaviour, responses to reagents or stimuli.
Eukaryotic
The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
RNA molecules are not necessarily
co-linear with their DNA template, as
intron
An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word ''intron'' is derived from the term ''intragenic region'', i.e., a region inside a gene."The notion of the cistron .e., gen ...
s are excised. This gives a certain complexity to map the read sequences back to the genome and thereby identify their origin.
For more information on the capabilities of next-generation sequencing applied to whole
transcriptomes see:
RNA-Seq and
MicroRNA Sequencing.
Protein sequencing
Methods for performing
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
sequencing
include:
*
Edman degradation
*
Peptide mass fingerprinting
*
Mass spectrometry
Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is used ...
*
Protease digests
If the gene encoding the protein is known, it is currently much easier to sequence the DNA and infer the protein sequence. Determining part of a protein's
amino-acid sequence (often one end) by one of the above methods may be sufficient to identify a
clone carrying this gene.
Polysaccharide sequencing
Though
polysaccharide
Polysaccharides (), or polycarbohydrates, are the most abundant carbohydrates found in food. They are long-chain polymeric carbohydrates composed of monosaccharide units bound together by glycosidic linkages. This carbohydrate can react with wat ...
s are also biopolymers, it is not so common to talk of 'sequencing' a polysaccharide, for several reasons. Although many polysaccharides are linear, many have branches. Many different units (individual
monosaccharide
Monosaccharides (from Greek '' monos'': single, '' sacchar'': sugar), also called simple sugars, are the simplest forms of sugar and the most basic units (monomers) from which all carbohydrates are built.
Chemically, monosaccharides are polyhy ...
s) can be used, and
bonded in different ways. However, the main theoretical reason is that whereas the other polymers listed here are primarily generated in a 'template-dependent' manner by one processive enzyme, each individual join in a polysaccharide may be formed by a different
enzyme
An enzyme () is a protein that acts as a biological catalyst by accelerating chemical reactions. The molecules upon which enzymes may act are called substrate (chemistry), substrates, and the enzyme converts the substrates into different mol ...
. In many cases the assembly is not uniquely specified; depending on which enzyme acts, one of several different units may be incorporated. This can lead to a family of similar molecules being formed. This is particularly true for plant polysaccharides. Methods for the
structure determination of
oligosaccharide
An oligosaccharide (; ) is a carbohydrate, saccharide polymer containing a small number (typically three to ten) of monosaccharides (simple sugars). Oligosaccharides can have many functions including Cell–cell recognition, cell recognition and ce ...
s and
polysaccharide
Polysaccharides (), or polycarbohydrates, are the most abundant carbohydrates found in food. They are long-chain polymeric carbohydrates composed of monosaccharide units bound together by glycosidic linkages. This carbohydrate can react with wat ...
s include
NMR
Nuclear magnetic resonance (NMR) is a physical phenomenon in which atomic nucleus, nuclei in a strong constant magnetic field are disturbed by a weak oscillating magnetic field (in the near and far field, near field) and respond by producing ...
spectroscopy and
methylation analysis.
See also
*
Exome sequencing
*
Full genome sequencing
*
Genetic code
Genetic code is a set of rules used by living cell (biology), cells to Translation (biology), translate information encoded within genetic material (DNA or RNA sequences of nucleotide triplets or codons) into proteins. Translation is accomplished ...
*
Pathogenomics
*
RNA-Seq
*
MicroRNA sequencing
*
Sequence motif
References
Links
* https://www.nature.com/subjects/sequencing
{{Bioinformatics
Biochemistry methods
Molecular biology