NGS Sequencing
   HOME

TheInfoList



OR:

Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to
DNA sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, thymine, cytosine, and guanine. The ...
using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation sequencing. Some of these technologies emerged between 1993 and 1998 and have been commercially available since 2005. These technologies use miniaturized and parallelized platforms for sequencing of 1 million to 43 billion short reads (50 to 400 bases each) per instrument run. Many NGS platforms differ in engineering configurations and sequencing chemistry. They share the technical paradigm of massive parallel sequencing via spatially separated, clonally amplified
DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
templates or single DNA molecules in a flow cell. This design is very different from that of
Sanger sequencing Sanger sequencing is a method of DNA sequencing that involves electrophoresis and is based on the random incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. After first being developed by Fred ...
—also known as capillary sequencing or first-generation sequencing—which is based on
electrophoretic Electrophoresis is the motion of charged dispersed particles or dissolved charged molecules relative to a fluid under the influence of a spatially uniform electric field. As a rule, these are zwitterions with a positive or negative net ch ...
separation of chain-termination products produced in individual sequencing reactions. This methodology allows sequencing to be completed on a larger scale.


History

In the 1990s,
Applied Biosystems Applied Biosystems is one of various brands under the Life Technologies brand of Thermo Fisher Scientific corporation. The brand is focused on integrated systems for genetic analysis, which include computerized machines and the consumables used ...
dominated DNA sequencing technology with their automated
capillary electrophoresis Capillary electrophoresis (CE) is a family of electrokinetic separation methods performed in submillimeter diameter capillaries and in micro- and nanofluidic channels. Very often, CE refers to capillary zone electrophoresis (CZE), but other electr ...
Sanger sequencing Sanger sequencing is a method of DNA sequencing that involves electrophoresis and is based on the random incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. After first being developed by Fred ...
machines. However, the early 2000s saw many new companies entering the market, driven by the goal of reducing genome sequencing costs below $1000 following the enthusiasm generated by the
Human Genome Project The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a ...
. Many of these new methods were first developed with support from the National Institutes of Health (NIH) funding under the 'Technology Development for the $1,000 Genome' program, launched during
Francis Collins Francis Sellers Collins (born April 14, 1950) is an American physician-scientist who discovered the genes associated with a number of diseases and led the Human Genome Project. He served as director of the National Institutes of Health (NIH) ...
’ tenure as director of the
National Human Genome Research Institute The National Human Genome Research Institute (NHGRI) is an institute of the National Institutes of Health, located in Bethesda, Maryland. NHGRI began as the Office of Human Genome Research in The Office of the Director in 1988. This Office transi ...
. The first next-generation sequencers were based on
pyrosequencing Pyrosequencing is a method of DNA sequencing (determining the order of nucleotides in DNA) based on the "sequencing by synthesis" principle, in which the sequencing is performed by detecting the nucleotide incorporated by a DNA polymerase. Pyrosequ ...
, originally developed by Pyrosequencing AB and later commercialized by
454 Life Sciences 454 Life Sciences was a biotechnology company based in Branford, Connecticut that specialized in high-throughput DNA sequencing. It was acquired by Roche in 2007 and shut down by Roche in 2013 when its technology became noncompetitive, although ...
. In 2003, 454 Life Sciences launched the GS20, the first NGS DNA sequencer. This system provided reads approximately 400–500 bp long with 99% accuracy, enabling sequencing of about 25 million bases in a four-hour run at significantly lower costs compared to Sanger sequencing. The sequencing machines developed by 454 represented a paradigm shift by enabling the mass parallelisation of sequencing reactions, which significantly boosted the amount of DNA sequenced per run, making 454 Life Sciences the first major success in commercial NGS technology. Also in 2003, Solexa began developing a competing method known as Sequencing by Synthesis (SBS). In 2004, Solexa acquired colony sequencing (bridge amplification) technology from Manteia, producing densely clustered DNA fragments ("polonies") immobilized on flow cells. These dense clusters generated stronger fluorescent signals, improving accuracy and reducing optical costs. In 2005, Solexa integrated an engineered
DNA polymerase A DNA polymerase is a member of a family of enzymes that catalyze the synthesis of DNA molecules from nucleoside triphosphates, the molecular precursors of DNA. These enzymes are essential for DNA replication and usually work in groups to create t ...
and reversible terminator nucleotides, allowing repeated cycles of sequencing and imaging. The first commercial sequencer based on this technology, Genome Analyzer, was launched in 2006, providing shorter reads (about 35 bp) but higher throughput (up to 1 Gbp per run) and paired-end sequencing capability (i.e. both DNA strands were sequenced simultaneously). in 2007, 454 Life Sciences was acquired by
Roche F. Hoffmann-La Roche AG, commonly known as Roche (), is a Switzerland, Swiss multinational corporation, multinational holding healthcare company that operates worldwide under two divisions: Pharmaceuticals and Diagnostics. Its holding company, ...
and Solexa by Illumina, the same year Applied Biosystems introduced
SOLiD Solid is a state of matter where molecules are closely packed and can not slide past each other. Solids resist compression, expansion, or external forces that would alter its shape, with the degree to which they are resisted dependent upon the ...
, a ligation-based sequencing platform. However, SOLiD encountered issues sequencing palindromic regions and was eventually discontinued. In 2011,
Ion Torrent Ion semiconductor sequencing is a method of DNA sequencing based on the detection of hydrogen ions that are released during the polymerization of DNA. This is a method of "sequencing by synthesis", during which a complementary strand is built base ...
introduced another alternative, measuring
proton A proton is a stable subatomic particle, symbol , Hydron (chemistry), H+, or 1H+ with a positive electric charge of +1 ''e'' (elementary charge). Its mass is slightly less than the mass of a neutron and approximately times the mass of an e ...
( pH) changes during nucleotide incorporation using
semiconductor A semiconductor is a material with electrical conductivity between that of a conductor and an insulator. Its conductivity can be modified by adding impurities (" doping") to its crystal structure. When two regions with different doping level ...
-based sensors. Ion Torrent systems rapidly produced 100 bp reads but frequently struggled with accurately sequencing homopolymers, ultimately leading to their abandonment. Due to limitations in competing methods, Illumina’s SBS technology eventually dominated the sequencing market. By 2012, expectations that 454 would gain a substantial share of the sequencing market had not been realized, and Roche’s 2007 acquisition was increasingly viewed as underperforming; that same year, Roche made an unsuccessful attempt to acquire Illumina. In October 2013, Roche announced that it would shut down 454, and stop supporting the platform by mid-2016. By 2014, Illumina controlled approximately 70% of DNA sequencer sales and generated over 90% of sequencing data. That year, Illumina introduced the HiSeq X Ten platform, significantly increasing throughput and claiming the long-targeted goal of sequencing human genomes at roughly $1000 each. Illumina surpassed this milestone in 2017 with the release of NovaSeq, a system capable of generating over 3000 Gbp per run.


NGS platforms

DNA sequencing with commercially available NGS platforms is generally conducted with the following steps. First, DNA sequencing libraries are generated by clonal amplification by PCR
in vitro ''In vitro'' (meaning ''in glass'', or ''in the glass'') Research, studies are performed with Cell (biology), cells or biological molecules outside their normal biological context. Colloquially called "test-tube experiments", these studies in ...
. Second, the DNA is sequenced by
synthesis Synthesis or synthesize may refer to: Science Chemistry and biochemistry *Chemical synthesis, the execution of chemical reactions to form a more complex molecule from chemical precursors **Organic synthesis, the chemical synthesis of organi ...
, such that the DNA sequence is determined by the addition of
nucleotides Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
to the complementary strand rather than through chain-termination chemistry. Third, the spatially segregated, amplified DNA templates are sequenced simultaneously in a massively parallel fashion without the requirement for a physical separation step. These steps are followed in most NGS platforms, but each utilizes a different strategy. NGS parallelization of the sequencing reactions generates hundreds of megabases to gigabases of nucleotide sequence reads in a single instrument run. This has enabled a drastic increase in available sequence data and fundamentally changed genome sequencing approaches in the biomedical sciences. Newly emerging NGS technologies and instruments have further contributed to a significant decrease in the cost of sequencing nearing the mark of $1000 per
genome sequencing Whole genome sequencing (WGS), also known as full genome sequencing or just genome sequencing, is the process of determining the entirety of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's ...
. As of 2014, massively parallel sequencing platforms are commercially available and their features are summarized in the table. As the pace of NGS technologies is advancing rapidly, technical specifications and pricing are in flux.
Run times and gigabase (Gb) output per run for single-end sequencing are noted. Run times and outputs approximately double when performing paired-end sequencing. ‡Average read lengths for the Roche 454 and Helicos Biosciences platforms.


Template preparation methods for NGS

Two methods are used in preparing templates for NGS reactions: amplified templates originating from single DNA molecules, and single DNA molecule templates. For imaging systems which cannot detect single fluorescence events, amplification of DNA templates is required. The three most common amplification methods are emulsion PCR (emPCR), rolling circle and solid-phase amplification. The final distribution of templates can be spatially random or on a grid.


Emulsion PCR

In
emulsion An emulsion is a mixture of two or more liquids that are normally Miscibility, immiscible (unmixable or unblendable) owing to liquid-liquid phase separation. Emulsions are part of a more general class of two-phase systems of matter called colloi ...
PCR methods, a
DNA library In molecular biology, a library is a collection of genetic material fragments that are stored and propagated in a population of microbes through the process of molecular cloning. There are different types of DNA libraries, including CDNA libra ...
is first generated through random fragmentation of genomic DNA. Single-stranded DNA fragments (templates) are attached to the surface of beads with adaptors or linkers, and one bead is attached to a single DNA fragment from the DNA library. The surface of the beads contains
oligonucleotide Oligonucleotides are short DNA or RNA molecules, oligomers, that have a wide range of applications in genetic testing, Recombinant DNA, research, and Forensic DNA, forensics. Commonly made in the laboratory by Oligonucleotide synthesis, solid-phase ...
probes with sequences that are complementary to the adaptors binding the DNA fragments. The beads are then compartmentalized into water-oil emulsion droplets. In the aqueous water-oil emulsion, each of the droplets capturing one bead is a PCR
microreactor A microreactor or microstructured reactor or microchannel reactor is a device in which chemical reactions take place in a confinement with typical lateral dimensions below 1 mm; the most typical form of such confinement are microchannels. M ...
that produces amplified copies of the single DNA template.


Gridded rolling circle nanoballs

Amplification of a population of single DNA molecules by rolling circle amplification in solution is followed by capture on a grid of spots sized to be smaller than the DNAs to be immobilized. Second-generation sequencing technologies like MGI Tech's DNBSEQ or Element Biosciences' AVITI use this approach for the preparation of the sample on the flow cell that is then imaged cycle by cycle.


DNA colony generation (Bridge amplification)

Forward and reverse primers are covalently attached at high-density to the slide in a flow cell. The ratio of the primers to the template on the support defines the surface density of the amplified clusters. The flow cell is exposed to reagents for
polymerase In biochemistry, a polymerase is an enzyme (Enzyme Commission number, EC 2.7.7.6/7/19/48/49) that synthesizes long chains of polymers or nucleic acids. DNA polymerase and RNA polymerase are used to assemble DNA and RNA molecules, respectively, by ...
-based extension, and priming occurs as the free/distal end of a ligated fragment "bridges" to a complementary oligo on the surface. Repeated denaturation and extension results in localized amplification of DNA fragments in millions of separate locations across the flow cell surface. Solid-phase amplification produces 100–200 million spatially separated template clusters, providing free ends to which a universal sequencing primer is then hybridized to initiate the sequencing reaction. This technology was filed for a patent in 1997 from Glaxo-Welcome's Geneva Biomedical Research Institute (GBRI), by
Pascal Mayer Pascal Mayer is a French biophysicist and entrepreneur specializing in biomolecular analyses for diagnostics, predictive medicine and drug discovery. He is known for his work that led to the development of a next-generation for an inexpensive and ...
, Eric Kawashima, and Laurent Farinelli, and was publicly presented for the first time in 1998. In 1994 Chris Adams and Steve Kron filed a patent on a similar, but non-clonal, surface amplification method, named “bridge amplification” adapted for clonal amplification in 1997 by Church and Mitra.


Single-molecule templates

Protocols requiring DNA amplification are often cumbersome to implement and may introduce sequencing errors. The preparation of single-molecule templates is more straightforward and does not require PCR, which can introduce errors in the amplified templates. AT-rich and GC-rich target sequences often show amplification bias, which results in their underrepresentation in genome alignments and assemblies. Single molecule templates are usually immobilized on solid supports using one of at least three different approaches. In the first approach, spatially distributed individual primer molecules are covalently attached to the solid support. The template, which is prepared by randomly fragmenting the starting material into small sizes (for example,~200–250 bp) and adding common adapters to the fragment ends, is then hybridized to the immobilized primer. In the second approach, spatially distributed single-molecule templates are covalently attached to the solid support by priming and extending single-stranded, single-molecule templates from immobilized primers. A common primer is then hybridized to the template. In either approach, DNA polymerase can bind to the immobilized primed template configuration to initiate the NGS reaction. Both of the above approaches are used by Helicos BioSciences. In a third approach, spatially distributed single polymerase molecules are attached to the solid support, to which a primed template molecule is bound. This approach is used by Pacific Biosciences. Larger DNA molecules (up to tens of thousands of base pairs) can be used with this technique and, unlike the first two approaches, the third approach can be used with real-time methods, resulting in potentially longer read lengths.


Sequencing approaches


Sequencing by synthesis

The objective for sequential sequencing by synthesis (SBS) is to determine the sequencing of a
DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
sample by detecting the incorporation of a
nucleotide Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
by a
DNA polymerase A DNA polymerase is a member of a family of enzymes that catalyze the synthesis of DNA molecules from nucleoside triphosphates, the molecular precursors of DNA. These enzymes are essential for DNA replication and usually work in groups to create t ...
. An engineered polymerase is used to synthesize a copy of a single strand of DNA and the incorporation of each nucleotide is monitored. The principle of sequencing by synthesis was first described in 1993 with improvements published some years later. The key parts are highly similar for all embodiments of SBS and include (1) amplification of DNA to enhance the subsequent signal and to attach the DNA to be sequenced to a solid support,  (2) generation of single stranded DNA on the solid support, (3) incorporation of nucleotides using an engineered polymerase and (4) detection of the incorporation of nucleotide. Then steps 3-4 are repeated and the sequence is assembled from the signals obtained in step 4. This principle of sequencing-by-synthesis has been used for almost all massive parallel sequencing instruments, including
454 Year 454 ( CDLIV) was a common year starting on Friday of the Julian calendar. At the time, it was known as the Year of the Consulship of Aetius and Studius (or, less frequently, year 1207 ''Ab urbe condita''). The denomination 454 for this ye ...
, PacBio, IonTorrent, Illumina and MGI.


Pyrosequencing

The principle of
Pyrosequencing Pyrosequencing is a method of DNA sequencing (determining the order of nucleotides in DNA) based on the "sequencing by synthesis" principle, in which the sequencing is performed by detecting the nucleotide incorporated by a DNA polymerase. Pyrosequ ...
was first described in 1993 by combining a solid support with an engineered DNA polymerase lacking 3´to 5´exonuclease activity (proof-reading) and
luminescence Luminescence is a spontaneous emission of radiation from an electronically or vibrationally excited species not in thermal equilibrium with its environment. A luminescent object emits ''cold light'' in contrast to incandescence, where an obje ...
real-time detection using the
firefly The Lampyridae are a family of elateroid beetles with more than 2,000 described species, many of which are light-emitting. They are soft-bodied beetles commonly called fireflies, lightning bugs, or glowworms for their conspicuous production ...
luciferase Luciferase is a generic term for the class of oxidative enzymes that produce bioluminescence, and is usually distinguished from a photoprotein. The name was first used by Raphaël Dubois who invented the words ''luciferin'' and ''luciferase'' ...
. All the key concepts of sequencing by synthesis were introduced, including (1) amplification of DNA to enhance the subsequent signal and attach the DNA to be sequenced (template) to a solid support, (2) generation of single stranded DNA on the solid support (3) incorporation of nucleotides using an engineered polymerase and (4) detection of the incorporated nucleotide by light detection in real-time. In a follow-up article, the concept was further developed and in 1998, an article was published in which the authors showed that non-incorporated nucleotides could be removed with a fourth enzyme (
apyrase Apyrase (, ''ATP-diphosphatase'', adenosine diphosphatase, ''ADPase'', ''ATP diphosphohydrolase'') is a calcium-activated plasma membrane-bound enzyme (magnesium can also activate it) () that catalyses the hydrolysis of ATP to yield AMP and in ...
) allowing sequencing by synthesis to be performed without the need for washing away non-incorporated nucleotides.


Sequencing by reversible terminator chemistry

This approach uses reversible terminator-bound dNTPs in a cyclic method that comprises nucleotide incorporation, fluorescence imaging and cleavage. A fluorescently-labeled terminator is imaged as each dNTP is added and then cleaved to allow incorporation of the next base. These nucleotides are chemically blocked such that each incorporation is a unique event. An imaging step follows each base incorporation step, then the blocked group is chemically removed to prepare each strand for the next incorporation by DNA polymerase. This series of steps continues for a specific number of cycles, as determined by user-defined instrument settings. The 3' blocking groups were originally conceived as either enzymatic or chemical reversal The chemical method has been the basis for the Solexa and Illumina machines. Sequencing by reversible terminator chemistry can be a four-colour cycle such as used by Illumina/Solexa, or a one-colour cycle such as used by Helicos BioSciences. Helicos BioSciences used “virtual Terminators”, which are unblocked terminators with a second nucleoside analogue that acts as an inhibitor. These terminators have the appropriate modifications for terminating or inhibiting groups so that DNA synthesis is terminated after a single base addition.


Sequencing-by-ligation mediated by ligase enzymes

In this approach, the sequence extension reaction is not carried out by polymerases but rather by DNA
ligase In biochemistry, a ligase is an enzyme that can catalyze the joining ( ligation) of two molecules by forming a new chemical bond. This is typically via hydrolysis of a small pendant chemical group on one of the molecules, typically resulting i ...
and either one-base-encoded probes or two-base-encoded probes. In its simplest form, a fluorescently labelled probe hybridizes to its complementary sequence adjacent to the primed template. DNA ligase is then added to join the dye-labelled probe to the primer. Non-ligated probes are washed away, followed by
fluorescence imaging Fluorescence imaging is a type of non-invasive imaging technique that can help visualize biological processes taking place in a living organism. Images can be produced from a variety of methods including: microscopy, imaging probes, and spectrosco ...
to determine the identity of the ligated probe. The cycle can be repeated either by using cleavable probes to remove the fluorescent dye and regenerate a 5′-PO4 group for subsequent ligation cycles (chained ligation) or by removing and hybridizing a new primer to the template (unchained ligation).


Phospholinked Fluorescent Nucleotides or Real-time sequencing

Pacific Biosciences Pacific Biosciences of California, Inc. (aka PacBio) is an American biotechnology company founded in 2004 that develops and manufactures systems for gene sequencing and some novel real time biological observation. PacBio has two principal sequ ...
is currently leading this method. The method of real-time sequencing involves imaging the continuous incorporation of dye-labelled nucleotides during DNA synthesis: single DNA polymerase molecules are attached to the bottom surface of individual zero-mode waveguide detectors (Zmw detectors) that can obtain sequence information while phospholinked nucleotides are being incorporated into the growing primer strand. Pacific Biosciences uses a unique DNA polymerase which better incorporates phospholinked nucleotides and enables the resequencing of closed circular templates. While single-read accuracy is 87%, consensus accuracy has been demonstrated at 99.999% with multi-kilobase read lengths. In 2015, Pacific Biosciences released a new sequencing instrument called the Sequel System, which increases capacity approximately 6.5-fold.


See also

* Clinical metagenomic sequencing * First-generation sequencing *
Third-generation sequencing Third-generation sequencing (also known as long-read sequencing) is a class of DNA sequencing methods that have the capability to produce substantially longer reads (ranging from 10 kb to >1 Mb in length) than second generation sequencing, also kno ...
* RNA Velocity


References

{{Reflist DNA sequencing methods