HOME

TheInfoList



OR:

Single-cell sequencing examines the nucleic acid
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...
information from individual cells with optimized
next-generation sequencing Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation ...
technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or
epigenetic In biology, epigenetics is the study of changes in gene expression that happen without changes to the DNA sequence. The Greek prefix ''epi-'' (ἐπι- "over, outside of, around") in ''epigenetics'' implies features that are "on top of" or "in ...
modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.


Background

A typical human cell consists of about 2 x 3.3 billion base pairs of DNA and 600 million mRNA bases. Usually, a mix of millions of cells is used in sequencing the DNA or RNA using traditional methods like
Sanger sequencing Sanger sequencing is a method of DNA sequencing that involves electrophoresis and is based on the random incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. After first being developed by Fred ...
or next generation sequencing. By deep sequencing of DNA and RNA from a single cell, cellular functions can be investigated extensively. Like typical next-generation sequencing experiments, single-cell sequencing protocols generally contain the following steps: isolation of a single cell, nucleic acid extraction and amplification, sequencing
library A library is a collection of Book, books, and possibly other Document, materials and Media (communication), media, that is accessible for use by its members and members of allied institutions. Libraries provide physical (hard copies) or electron ...
preparation, sequencing, and
bioinformatic Bioinformatics () is an interdisciplinary field of science Science is a systematic discipline that builds and organises knowledge in the form of testable hypotheses and predictions about the universe. Modern science is typically divi ...
data analysis. It is more challenging to perform single-cell sequencing than sequencing from cells in bulk. The minimal amount of starting materials from a single cell makes degradation, sample loss, and contamination exert pronounced effects on the quality of sequencing data. In addition, due to the picogram level of the number of nucleic acids used, heavy amplification is often needed during sample preparation of single-cell sequencing, resulting in uneven coverage, noise, and inaccurate quantification of sequencing data. Recent technical improvements make single-cell sequencing a promising tool for approaching a set of seemingly inaccessible problems. For example, heterogeneous samples, rare cell types, cell lineage relationships, mosaicism of somatic tissues, analyses of microbes that cannot be cultured, and disease evolution can all be elucidated through single-cell sequencing. Single-cell sequencing was selected as the method of the year 2013 by Nature Publishing Group.


Genome (DNA) sequencing

Single-cell DNA genome sequencing involves isolating a single cell, amplifying the whole genome or region of interest, constructing sequencing libraries, and then applying next-generation DNA sequencing (for example Illumina,
Ion Torrent Ion semiconductor sequencing is a method of DNA sequencing based on the detection of hydrogen ions that are released during the polymerization of DNA. This is a method of "sequencing by synthesis", during which a complementary strand is built base ...
). Single-cell DNA sequencing has been widely applied in mammalian systems to study normal physiology and disease. Single-cell resolution can uncover the roles of genetic mosaicism or intra-tumor genetic heterogeneity in cancer development or treatment response. In the context of microbiomes, a genome from a single unicellular organism is referred to as a single amplified genome (SAG). Advancements in single-cell DNA sequencing have enabled collecting of genomic data from uncultivated prokaryotic species present in complex microbiomes.  Although SAGs are characterized by low completeness and significant bias, recent computational advances have achieved the assembly of near-complete genomes from composite SAGs. Data obtained from microorganisms might establish processes for culturing in the future."" Some of the genome assembly tools used in single cell single-cell sequencing include SPAdes, IDBA-UD, Cortex, and HyDA.


Methods

A list of more than 100 different single-cell omics methods has been published. Multiple displacement amplification (MDA) is a widely used technique, enabling amplifying femtograms of DNA from bacterium to
microgram In the metric system, a microgram or microgramme is a unit of mass equal to one millionth () of a gram. The unit symbol is μg according to the International System of Units (SI); the recommended symbol in the United States and United Kingdom wh ...
s for sequencing. Reagents required for MDA reactions include: random primers and DNA polymerase from bacteriophage phi29. In 30 degree isothermal reaction, DNA is amplified with included reagents. As the
polymerases In biochemistry, a polymerase is an enzyme ( EC 2.7.7.6/7/19/48/49) that synthesizes long chains of polymers or nucleic acids. DNA polymerase and RNA polymerase are used to assemble DNA and RNA molecules, respectively, by copying a DNA template s ...
manufacture new strands, a strand displacement reaction takes place, synthesizing multiple copies from each template DNA. At the same time, the strands that were extended antecedently will be displaced. MDA products result in a length of about 12 kb and ranges up to around 100 kb, enabling its use in DNA sequencing. In 2017, a major improvement to this technique, called WGA-X, was introduced by taking advantage of a thermostable mutant of the phi29 polymerase, leading to better genome recovery from individual cells, in particular those with high G+C content. MDA has also been implemented in a microfluidic droplet-based system to achieve a highly parallelized single-cell whole genome amplification. By encapsulating single-cells in droplets for DNA capture and amplification, this method offers reduced bias and enhanced throughput compared to conventional MDA. Another common method is
MALBAC Multiple Annealing and Looping Based Amplification Cycles (MALBAC) is a quasilinear whole genome amplification method. Unlike conventional DNA amplification methods that are non-linear or exponential (in each cycle, DNA copied can serve as template ...
. As done in MDA, this method begins with isothermal amplification, but the primers are flanked with a “common” sequence for downstream PCR amplification. As the preliminary amplicons are generated, the common sequence promotes self-ligation and the formation of “loops” to prevent further amplification. In contrast with MDA, the highly branched DNA network is not formed. Instead, the loops are denatured in another temperature cycle allowing the fragments to be amplified with PCR. MALBAC has also been implemented in a microfluidic device, but the amplification performance was not significantly improved by encapsulation in nanoliter droplets. Comparing MDA and MALBAC, MDA results in better genome coverage, but MALBAC provides more even coverage across the genome. MDA could be more effective for identifying
SNPs In genetics and bioinformatics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in ...
, whereas MALBAC is preferred for detecting copy number variants. While performing MDA with a microfluidic device markedly reduces bias and contamination, the chemistry involved in MALBAC does not demonstrate the same potential for improved efficiency. A method particularly suitable for the discovery of genomic
structural variation Genomic structural variation is the variation in structure of an organism's chromosome, such as deletions, duplications, copy-number variants, insertions, inversions and translocations. Originally, a structure variation affects a sequence length a ...
is
Single-cell DNA template strand sequencing Single-cell DNA template strand sequencing, or Strand-seq, is a technique for the selective sequencing of a daughter cell's parental template strands. This technique offers a wide variety of applications, including the identification of sister chrom ...
(a.k.a. Strand-seq). Using the principle of single-cell tri-channel processing, which uses joint modelling of read-orientation, read-depth, and haplotype-phase, Strand-seq enables discovery of the full spectrum of somatic
structural variation Genomic structural variation is the variation in structure of an organism's chromosome, such as deletions, duplications, copy-number variants, insertions, inversions and translocations. Originally, a structure variation affects a sequence length a ...
classes ≥200kb in size. Strand-seq overcomes limitations of whole genome amplification based methods for identification of somatic genetic variation classes in single cells,"" because it is not susceptible against read chimers leading to calling artefacts (discussed in detail in the section below), and is less affected by drop outs. The choice of method depends on the goal of the sequencing because each method presents different advantages.


Limitations

MDA of individual cell genomes results in highly uneven genome coverage, i.e. relative overrepresentation and underrepresentation of various regions of the template, leading to loss of some sequences. There are two components to this process: a) stochastic over- and under-amplification of random regions; and b) systematic bias against high %GC regions. The stochastic component may be addressed by pooling single-cell MDA reactions from the same cell type, by employing
fluorescent in situ hybridization Fluorescence ''in situ'' hybridization (FISH) is a cytogenetics, molecular cytogenetic technique that uses hybridization probe, fluorescent probes that bind to only particular parts of a nucleic acid sequence with a high degree of sequence Com ...
(FISH) and/or post-sequencing confirmation. The bias of MDA against high %GC regions can be addressed by using thermostable polymerases, such as in the process called WGA-X.
Single-nucleotide polymorphisms In genetics and bioinformatics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in ...
(SNPs), which are a big part of genetic variation in the
human genome The human genome is a complete set of nucleic acid sequences for humans, encoded as the DNA within each of the 23 distinct chromosomes in the cell nucleus. A small DNA molecule is found within individual Mitochondrial DNA, mitochondria. These ar ...
, and
copy number variation Copy number variation (CNV) is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals. Copy number variation is a type of structural variation: specifically, it is a type of ...
(CNV), pose problems in single cell sequencing, as well as the limited amount of DNA extracted from a single cell. Due to scant amounts of DNA, accurate analysis of DNA poses problems even after amplification since coverage is low and is susceptible to errors. With MDA, average genome coverage is less than 80% and SNPs that are not covered by sequencing reads will be opted out. In addition, MDA shows a high ratio of
allele An allele is a variant of the sequence of nucleotides at a particular location, or Locus (genetics), locus, on a DNA molecule. Alleles can differ at a single position through Single-nucleotide polymorphism, single nucleotide polymorphisms (SNP), ...
dropout, not detecting alleles from heterozygous samples. Various SNP algorithms are currently in use but none are specific to single-cell sequencing. MDA with CNV also poses the problem of identifying false CNVs that conceal the real CNVs. To solve this, when patterns can be generated from false CNVs, algorithms can detect and eradicate this noise to produce true variants."" Strand-seq overcomes limitations of methods based on whole genome amplification for genetic variant calling: Since Strand-seq does not require reads (or read pairs) transversing the boundaries (or breakpoints) of CNVs or copy-balanced structural variant classes, it is less susceptible to common artefacts of single-cell methods based on whole genome amplification, which include variant calling dropouts due to missing reads at the variant breakpoint and read chimera. Strand-seq discovers the full spectrum of
structural variation Genomic structural variation is the variation in structure of an organism's chromosome, such as deletions, duplications, copy-number variants, insertions, inversions and translocations. Originally, a structure variation affects a sequence length a ...
classes of at least 200kb in size, including
breakage-fusion-bridge cycle Breakage-fusion-bridge (BFB) cycle (also breakage-rejoining-bridge cycle) is a mechanism of Chromosome instability, chromosomal instability, discovered by Barbara McClintock in the late 1930s. Mechanism The BFB cycle begins when the end region of ...
s and
chromothripsis Chromothripsis is a mutational process by which up to thousands of clustered chromosomal rearrangements occur in a single event in localised and confined genomic regions in one or a few chromosomes, and is known to be involved in both cancer and c ...
events, as well as balanced inversions, and copy-number balanced or imbalanced translocations." Structural variant calls made by Strand-seq are resolved by chromosome-length
haplotype A haplotype (haploid genotype) is a group of alleles in an organism that are inherited together from a single parent. Many organisms contain genetic material (DNA) which is inherited from two parents. Normally these organisms have their DNA orga ...
, which provides additional variant calling specificity. As a current limitation, Strand-seq requires dividing cells for strand-specific labelling using
bromodeoxyuridine Bromodeoxyuridine (5-bromo-2'-deoxyuridine, BrdU, BUdR, BrdUrd, broxuridine) is a synthetic nucleoside analogue with a chemical structure similar to thymidine. BrdU is commonly used to study cell proliferation in living tissues and has been s ...
(BrdU), and the method does not detect variants smaller than 200kb in size, such as mobile element insertions.


Applications

Microbiome A microbiome () is the community of microorganisms that can usually be found living together in any given habitat. It was defined more precisely in 1988 by Whipps ''et al.'' as "a characteristic microbial community occupying a reasonably wel ...
s are among the main targets of single cell genomics due to the difficulty of culturing the majority of microorganisms in most environments. Single-cell genomics is a powerful way to obtain microbial genome sequences without cultivation. This approach has been widely applied on marine, soil, subsurface, organismal, and other types of microbiomes in order to address a wide array of questions related to microbial ecology, evolution, public health and biotechnology potential. Cancer sequencing is also an emerging application of scDNAseq. Fresh or frozen tumors may be analyzed and categorized with respect to SCNAs, SNVs, and rearrangements quite well using whole-genome DNAS approaches. Cancer scDNAseq is particularly useful for examining the depth of complexity and compound mutations present in amplified therapeutic targets such as receptor tyrosine kinase genes (EGFR, PDGFRA etc.) where conventional population-level approaches of the bulk tumor are not able to resolve the co-occurrence patterns of these mutations within single cells of the tumor. Such overlap may provide redundancy of pathway activation and tumor cell resistance.


DNA methylome sequencing

Single-cell DNA methylome sequencing quantifies
DNA methylation DNA methylation is a biological process by which methyl groups are added to the DNA molecule. Methylation can change the activity of a DNA segment without changing the sequence. When located in a gene promoter (genetics), promoter, DNA methylati ...
. There are several known types of methylation that occur in nature, including
5-methylcytosine 5-Methylcytosine (5mC) is a methylation, methylated form of the DNA base cytosine (C) that regulates gene Transcription (genetics), transcription and takes several other biological roles. When cytosine is methylated, the DNA maintains the same s ...
(5mC),
5-hydroxymethylcytosine 5-Hydroxymethylcytosine (5hmC) is a DNA pyrimidine nitrogen base derived from cytosine. It is potentially important in epigenetics, because the hydroxymethyl group on the cytosine can possibly switch a gene on and off. It was first seen in bact ...
(5hmC), 6-methyladenosine (6mA), and 4-methylcytosine (4mC). In eukaryotes, especially animals, 5mC is widespread along the genome and plays an important role in regulating gene expression by repressing
transposable element A transposable element (TE), also transposon, or jumping gene, is a type of mobile genetic element, a nucleic acid sequence in DNA that can change its position within a genome. The discovery of mobile genetic elements earned Barbara McClinto ...
s. Sequencing 5mC in individual cells can reveal how epigenetic changes across genetically identical cells from a single tissue or population give rise to cells with different phenotypes.


Methods

Bisulfite sequencing Bisulfite sequencing (also known as bisulphite sequencing) is the use of bisulfite treatment of DNA before routine sequencing to determine the pattern of methylation. DNA methylation was the first discovered epigenetic mark, and remains the mos ...
has become the gold standard in detecting and sequencing 5mC in single cells. Treatment of DNA with bisulfite converts cytosine residues to uracil, but leaves 5-methylcytosine residues unaffected. Therefore, DNA that has been treated with bisulfite retains only methylated cytosines. To obtain the methylome readout, the bisulfite-treated sequence is aligned to an unmodified genome. Whole genome bisulfite sequencing was achieved in single cells in 2014. The method overcomes the loss of DNA associated with the typical procedure, where sequencing adapters are added prior to bisulfite fragmentation. Instead, the adapters are added after the DNA is treated and fragmented with bisulfite, allowing all fragments to be amplified by PCR. Using deep sequencing, this method captures ~40% of the total CpGs in each cell. With existing technology DNA cannot be amplified prior to bisulfite treatment, as the 5mC marks will not be copied by the polymerase. Single-cell reduced representation bisulfite sequencing (scRRBS) is another method. This method leverages the tendency of methylated cytosines to cluster at CpG islands (CGIs) to enrich for areas of the genome with a high CpG content. This reduces the cost of sequencing compared to whole-genome bisulfite sequencing, but limits the coverage of this method. When RRBS is applied to bulk samples, the majority of the CpG sites in gene promoters are detected, but site in gene promoters only account for 10% of CpG sites in the entire genome. In single cells, 40% of the CpG sites from the bulk sample are detected. To increase coverage, this method can also be applied to a small pool of single cells. In a sample of 20 pooled single cells, 63% of the CpG sites from the bulk sample were detected. Pooling single cells is one strategy to increase methylome coverage, but at the cost of obscuring the heterogeneity in the population of cells.


Limitations

While bisulfite sequencing remains the most widely used approach for 5mC detection, the chemical treatment is harsh and fragments and degrades the DNA. This effect is exacerbated when moving from bulk samples to single cells. Other methods to detect DNA methylation include methylation-sensitive restriction enzymes. Restriction enzymes also enable the detection of other types of methylation, such as 6mA with
DpnI DpnI (pronounced "''D-P-N one''") is a Type IIM restriction endonuclease isolated from ''Streptococcus pneumonae'' (formerly ''Diplococcus pneumonae''). It recognizes and cuts methylated DNA at the sequence G m6A↓TC. Structure The structure ...
. Nanopore-based sequencing also offers a route for direct methylation sequencing without fragmentation or modification to the original DNA. Nanopore sequencing has been used to sequence the methylomes of bacteria, which are dominated by 6mA and 4mC (as opposed to 5mC in eukaryotes), but this technique has not yet been scaled down to single cells.


Applications

Single-cell DNA methylation sequencing has been widely used to explore epigenetic differences in genetically similar cells. To validate these methods during their development, the single-cell methylome data of a mixed population were successfully classified by hierarchal clustering to identify distinct cell types. Another application is studying single cells during the first few cell divisions in early development to understand how different cell types emerge from a single embryo. Single-cell whole-genome bisulfite sequencing has also been used to study rare but highly active cell types in cancer such as circulating tumor cells (CTCs).


Transposase-accessible chromatin sequencing (scATAC-seq)

Single cell transposase-accessible chromatin sequencing maps chromatin accessibility across the genome. A transposase inserts sequencing adapters directly into open regions of chromatin, allowing those regions to be amplified and sequenced.


Methods

The two methods for library preparation in scATAC-Seq are based on split-pool cellular indexing and microfluidics.


Transcriptome sequencing (scRNA-seq)

Standard methods such as
microarray A microarray is a multiplex (assay), multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of biological interactions. It is a two-dimensional array on a Substrate (materials science), solid substrate—usu ...
s and bulk
RNA-seq RNA-Seq (named as an abbreviation of RNA sequencing) is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also k ...
analyze the RNA expression from large populations of cells. These measurements may obscure critical differences between individual cells in mixed-cell populations. Single-cell RNA sequencing (scRNA-seq) provides the expression profiles of individual cells and is considered the
gold standard A gold standard is a backed currency, monetary system in which the standard economics, economic unit of account is based on a fixed quantity of gold. The gold standard was the basis for the international monetary system from the 1870s to the ...
for defining cell states and phenotypes as of 2020. Although it is impossible to obtain complete information on every RNA expressed by each cell, due to the small amount of material available, gene expression patterns can be identified through gene clustering analyses. This can uncover rare cell types within a cell population that may never have been seen before. For example, one group of scientists performing scRNA-seq on neuroblastoma tumor tissue identified a rare pan-neuroblastoma cancer cell, which may be attractive for novel therapy approaches.


Methods

Current scRNA-seq protocols involve isolating single cells and their RNA, and then following the same steps as bulk RNA-seq:
reverse transcription A reverse transcriptase (RT) is an enzyme used to convert RNA genome to DNA, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B virus, hepatitis B to replicate their genomes, by retrot ...
(RT), amplification, library generation and sequencing. Early methods separated individual cells into separate wells; more recent methods encapsulate individual cells in droplets in a microfluidic device, where the reverse transcription reaction takes place, converting RNAs to cDNAs. Each droplet carries a DNA "barcode" that uniquely labels the cDNAs derived from a single cell. Once reverse transcription is complete, the cDNAs from many cells can be mixed together for sequencing, because transcripts from a particular cell are identified by the unique barcode. Challenges for scRNA-Seq include preserving the initial relative abundance of mRNA in a cell and identifying rare transcripts."" The reverse transcription step is critical as the efficiency of the RT reaction determines how much of the cell's RNA population will be eventually analyzed by the sequencer. The
processivity In molecular biology and biochemistry, processivity is an enzyme's ability to catalyze "consecutive reactions without releasing its substrate". For example, processivity is the average number of nucleotides added by a polymerase enzyme, such as ...
of reverse transcriptases and the priming strategies used may affect full-length cDNA production and the generation of libraries biased toward 3’ or 5' end of genes. In the amplification step, either PCR or in vitro transcription (IVT) is currently used to amplify cDNA. One of the advantages of PCR-based methods is the ability to generate full-length cDNA. However, different PCR efficiency on particular sequences (for instance, GC content and snapback structure) may also be exponentially amplified, producing libraries with uneven coverage. On the other hand, while libraries generated by IVT can avoid PCR-induced sequence bias, specific sequences may be transcribed inefficiently, thus causing sequence drop-out or generating incomplete sequences. Several scRNA-seq protocols have been published: Tang et al., STRT, SMART-seq, SORT-seq, CEL-seq, RAGE-seq, Quartz-seq. , and C1-CAGE. These protocols differ in terms of strategies for reverse transcription, cDNA synthesis and amplification, and the possibility to accommodate sequence-specific barcodes (i.e., UMIs) or the ability to process pooled samples. In 2017, two approaches were introduced to simultaneously measure single-cell mRNA and protein expression through oligonucleotide-labeled antibodies known as REAP-seq, and CITE-seq. Collecting cellular contents following electrophysiological recording using patch-clamp has also allowed development of the Patch-Seq method, which is steadily gaining ground in neuroscience.


Example of a droplet based platform - 10X method

This platform of single cell RNA sequencing allows to analyze transcriptomes on a cell-by-cell basis by the use of microfluidic partitioning to capture single cells and prepare
next-generation sequencing Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation ...
(NGS) cDNA libraries. The droplets based platform enables massively parallel sequencing of mRNA in a large numbers of individual cells by capturing single cell in oil droplet. Overall, in a first stage individual cells are captured separately and lysed, then
reverse transcription A reverse transcriptase (RT) is an enzyme used to convert RNA genome to DNA, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B virus, hepatitis B to replicate their genomes, by retrot ...
(RT) of
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein. mRNA is ...
is performed and
cDNA In genetics, complementary DNA (cDNA) is DNA that was reverse transcribed (via reverse transcriptase) from an RNA (e.g., messenger RNA or microRNA). cDNA exists in both single-stranded and double-stranded forms and in both natural and engin ...
library is obtained. To select mRNA, the RT is performed with a single-stranded sequence of deoxythymine (oligo dT)
primer Primer may refer to: Arts, entertainment, and media Films * ''Primer'' (film), a 2004 feature film written and directed by Shane Carruth * ''Primer'' (video), a documentary about the funk band Living Colour Literature * Primer (textbook), a te ...
which bind specifically the poly(A) tail of mRNA molecules. Subsequently, the amplified cDNA library is used for sequencing. So, the first step of the method is the single cell encapsulation and library preparation. Cells are encapsulated into Gel Beads-in-emulsion (GEMs) thanks to an automate. To form these vesicle, the automate uses a microfluidic chip and combines all components with oil. Each functional GEM contains a single cell, a single Gel Bead, and RT reagents. On the Gel Bead, olignonucleotides composed by 4 distincts parts are bound:
PCR primer A primer is a short, single-stranded nucleic acid used by all living organisms in the initiation of DNA synthesis. A synthetic primer is a type of oligo, short for oligonucleotide. DNA polymerases (responsible for DNA replication) are only ca ...
(essential for the sequencing) ; 10X barcoded oligonucleotides ; Unique Molecular Identifier (UMI) sequence ; PolydT sequence (that enables capture of poly-adenylated mRNA molecules). Within each GEM reaction vesicle, a single cell is lysed and undergo reverse transcription. cDNA from the same cell are identified thanks to a common 10X barcode. In addition, the number of UMIs express the gene expression level and its analyse allows to detect highly variable genes. Those data are often used for either cellular phenotype classification or new subpopulation identification. The final step of the platform is the sequencing. Libraries generated can be directly used for single cell whole transcriptome sequencing or target sequencing workflows. The sequencing is performed by using the Illumina dye sequencing method. This sequencing method is based on sequencing by synthesis (SBS) principle and the use of reversible dye-terminator that enables the identification of each single nucleotid. In order to read the transcript sequences on one end, and the barcode and UMI on the other end, paired-end sequencing readers are required. The droplet-based platform allows the detection of rare cell types thanks to its high throughput. In fact, 500 to 20,000 cells are captured per sample from a single cell suspension. The protocol is performed easily and allows a high cell recovery rate of up to 65%. The global workflow of the droplet-based platform takes 8 hours and so is faster than the Microwell-based method (BD Rhapsody), which takes 10 hours. However, it presents some limitations as the need of fresh samples and the final detection of only 10% mRNA. The major difference between the droplet-based method and the microwell-based method is the technique used for partitioning cells.


Limitations

Most RNA-seq methods depend on
poly(A) tail Polyadenylation is the addition of a poly(A) tail to an RNA transcript, typically a messenger RNA (mRNA). The poly(A) tail consists of multiple adenosine monophosphates; in other words, it is a stretch of RNA that has only adenine bases. In euka ...
capture to enrich mRNA and deplete abundant and uninformative rRNA. Thus, they are often restricted to sequencing polyadenylated mRNA molecules. However, recent studies are now starting to appreciate the importance of non-poly(A) RNA, such as long-noncoding RNA and microRNAs in gene expression regulation. Small-seq is a single-cell method that captures small RNAs (<300 nucleotides) such as microRNAs, fragments of tRNAs and small nucleolar RNAs in mammalian cells. This method uses a combination of “oligonucleotide masks” (that inhibit the capture of highly abundant 5.8S rRNA molecules) and size selection to exclude large RNA species such as other highly abundant rRNA molecules. To target larger non-poly(A) RNAs, such as long non-coding mRNA, histone mRNA, circular RNA, and enhancer RNA, size selection is not applicable for depleting the highly abundant ribosomal RNA molecules (18S and 28s rRNA). Single-cell RamDA-Seq is a method that achieves this by performing reverse transcription with random priming (random displacement amplification) in the presence of “not so random” (NSR) primers specifically designed to avoid priming on rRNA molecule. While this method successfully captures full-length total RNA transcripts for sequencing and detected a variety of non-poly(A) RNAs with high sensitivity, it has some limitations. The NSR primers were carefully designed according to rRNA sequences in the specific organism (mouse), and designing new primer sets for other species would take considerable effort. Recently, a CRISPR-based method named scDASH (single-cell depletion of abundant sequences by hybridization) demonstrated another approach to depleting rRNA sequences from single-cell total RNA-seq libraries. Bacteria and other prokaryotes are currently not amenable to single-cell RNA-seq due to the lack of polyadenylated mRNA. Thus, the development of single-cell RNA-seq methods that do not depend on poly(A) tail capture will also be instrumental in enabling single-cell resolution microbiome studies. Bulk bacterial studies typically apply general rRNA depletion to overcome the lack of polyadenylated mRNA on bacteria, but at the single-cell level, the total RNA found in one cell is too small. Lack of polyadenylated mRNA and scarcity of total RNA found in single bacteria cells are two important barriers limiting the deployment of scRNA-seq in bacteria. Limitations due to partial and biased scRNA-seq sampling also arise as scRNA-seq only captures a snap shot of cellular activity. For example, in large, ramified cell types like
neurons A neuron (American English), neurone (British English), or nerve cell, is an membrane potential#Cell excitability, excitable cell (biology), cell that fires electric signals called action potentials across a neural network (biology), neural net ...
, single-cell isolation only captures RNA from the central cell bodies separated from their processes during trituration. In the brain it is estimated that over 40% of total RNA is in cellular processes such as
axons An axon (from Greek ἄξων ''áxōn'', axis) or nerve fiber (or nerve fibre: see spelling differences) is a long, slender projection of a nerve cell, or neuron, in vertebrates, that typically conducts electrical impulses known as action pot ...
,
dendrites A dendrite (from Greek δένδρον ''déndron'', "tree") or dendron is a branched cytoplasmic process that extends from a nerve cell that propagates the electrochemical stimulation received from other neural cells to the cell body, or soma ...
,
astrocyte Astrocytes (from Ancient Greek , , "star" and , , "cavity", "cell"), also known collectively as astroglia, are characteristic star-shaped glial cells in the brain and spinal cord. They perform many functions, including biochemical control of en ...
end-feet, and thus not visible to scRNA-seq methods.


Applications

scRNA-Seq is becoming widely used across biological disciplines including
Developmental biology Developmental biology is the study of the process by which animals and plants grow and develop. Developmental biology also encompasses the biology of Regeneration (biology), regeneration, asexual reproduction, metamorphosis, and the growth and di ...
,
Neurology Neurology (from , "string, nerve" and the suffix wikt:-logia, -logia, "study of") is the branch of specialty (medicine) , medicine dealing with the diagnosis and treatment of all categories of conditions and disease involving the nervous syst ...
,
Oncology Oncology is a branch of medicine that deals with the study, treatment, diagnosis, and prevention of cancer. A medical professional who practices oncology is an ''oncologist''. The name's Etymology, etymological origin is the Greek word ὄγ ...
,
Immunology Immunology is a branch of biology and medicine that covers the study of Immune system, immune systems in all Organism, organisms. Immunology charts, measures, and contextualizes the Physiology, physiological functioning of the immune system in ...
, Cardiovascular research and
Infectious disease An infection is the invasion of tissue (biology), tissues by pathogens, their multiplication, and the reaction of host (biology), host tissues to the infectious agent and the toxins they produce. An infectious disease, also known as a transmis ...
. Using
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
methods, data from bulk RNA-Seq has been used to increase the signal/noise ratio in scRNA-Seq. Specifically, scientists have used gene expression profiles from pan-cancer datasets in order to build coexpression networks, and then have applied these on single cell gene expression profiles, obtaining a more robust method to detect the presence of mutations in individual cells using transcript levels. Some scRNA-seq methods have also been applied to single cell microorganisms. SMART-seq2 has been used to analyze single cell eukaryotic microbes, but since it relies on poly(A) tail capture, it has not been applied in prokaryotic cells. Microfluidic approaches such as Drop-seq and the Fluidigm IFC-C1 devices have been used to sequence single malaria parasites or single yeast cells. The single-cell yeast study sought to characterize the heterogeneous stress tolerance in isogenic yeast cells before and after the yeast are exposed to salt stress. Single-cell analysis of the several transcription factors by scRNA-seq revealed heterogeneity across the population. These results suggest that regulation varies among members of a population to increase the chances of survival for a fraction of the population. The first single-cell transcriptome analysis in a prokaryotic species was accomplished using the terminator exonuclease enzyme to selectively degrade rRNA and rolling circle amplification (RCA) of mRNA. In this method, the ends of single-stranded DNA were ligated together to form a circle, and the resulting loop was then used as a template for linear RNA amplification. The final product library was then analyzed by microarray, with low bias and good coverage. However, RCA has not been tested with RNA-seq, which typically employs next-generation sequencing. Single-cell RNA-seq for bacteria would be highly useful for studying microbiomes. It would address issues encountered in conventional bulk metatranscriptomics approaches, such as failing to capture species present in low abundance, and failing to resolve heterogeneity among cell populations. scRNA-Seq has provided considerable insight into the development of embryos and organisms, including the worm
Caenorhabditis elegans ''Caenorhabditis elegans'' () is a free-living transparent nematode about 1 mm in length that lives in temperate soil environments. It is the type species of its genus. The name is a Hybrid word, blend of the Greek ''caeno-'' (recent), ''r ...
, and the regenerative planarian
Schmidtea mediterranea ''Schmidtea mediterranea'' is a freshwater triclad that lives in southern Europe and Tunisia. It is a model for regeneration, stem cells and development of tissues such as the brain and germline. Distribution ''Schmidtea mediterranea'' is found ...
and axolotl
Ambystoma mexicanum The axolotl (; from ) (''Ambystoma mexicanum'') is a neoteny, paedomorphic salamander, one that Sexual maturity, matures without undergoing metamorphosis into the terrestrial adult form; adults remain Aquatic animal, fully aquatic with obvio ...
. The first vertebrate animals to be mapped in this way were
Zebrafish The zebrafish (''Danio rerio'') is a species of freshwater ray-finned fish belonging to the family Danionidae of the order Cypriniformes. Native to South Asia, it is a popular aquarium fish, frequently sold under the trade name zebra danio (an ...
and ''
Xenopus laevis The African clawed frog (''Xenopus laevis''), also known as simply xenopus, African clawed toad, African claw-toed frog or the ''platanna'') is a species of African aquatic frog of the family Pipidae. Its name is derived from the short black ...
''. In each case multiple stages of the embryo were studied, allowing the entire process of development to be mapped on a cell-by-cell basis.
Science Science is a systematic discipline that builds and organises knowledge in the form of testable hypotheses and predictions about the universe. Modern science is typically divided into twoor threemajor branches: the natural sciences, which stu ...
recognized these advances as the 2018
Breakthrough of the Year The Breakthrough of the Year is an annual award for the most significant development in scientific research made by the American Association for the Advancement of Science, AAAS journal ''Science (journal), Science,'' an academic journal covering a ...
. A molecular cell atlas of mice testes was established to define BDE47-induced prepubertal testicular toxicity using the ScRNA-seq approach, providing novel insight into our understanding of the underlying mechanisms and pathways involved in BDE47-associated testicular injury at a single-cell resolution.


Considerations


Isolation of single cells

There are several ways to isolate individual cells prior to whole genome amplification and sequencing.
Fluorescence-activated cell sorting Flow cytometry (FC) is a technique used to detect and measure the physical and chemical characteristics of a population of cells or particles. In this process, a sample containing cells or particles is suspended in a fluid and injected into the ...
(FACS) is a widely used approach. Individual cells can also be collected by micromanipulation, for example by serial dilution or by using a patch pipette or nanotube to harvest a single cell. The advantages of micromanipulation are ease and low cost, but they are laborious and susceptible to misidentification of cell types under microscope. Laser-capture microdissection (LCM) can also be used for collecting single cells. Although LCM preserves the knowledge of the spatial location of a sampled cell within a tissue, it is hard to capture a whole single cell without also collecting the materials from neighboring cells."" High-throughput methods for single cell isolation also include
microfluidics Microfluidics refers to a system that manipulates a small amount of fluids (10−9 to 10−18 liters) using small channels with sizes of ten to hundreds of micrometres. It is a multidisciplinary field that involves molecular analysis, molecular bi ...
. Both FACS and microfluidics are accurate, automatic and capable of isolating unbiased samples. However, both methods require detaching cells from their microenvironments first, thereby causing perturbation to the transcriptional profiles in RNA expression analysis.


Number of cells to be sequenced and analyzed


scRNA-Seq

The single-cell RNA-Seq protocols vary in efficiency of RNA capture, which results in differences in the number of transcripts generated from each single cell. Single-cell libraries are usually sequenced to a depth of 1,000,000 reads because a large majority of genes are detected with 500,000 reads. Increasing the number of cells and decreasing the read depth increases the power of identifying major cell populations. However, low read depths may not always provide necessary information about the genes, and the difference in their expression between the cell populations is dependent on the stability and detection of the mRNA molecules. Quality control covariates serve as a strategy to analyze the number of cells. These covariates mainly include filtering based on count depth, the number of genes, and the fraction of counts from mitochondrial genes, which leads to the interpretation of cellular signals.


See also

*
Single-cell analysis In cell biology, single-cell analysis and subcellular analysis refer to the study of genomics, transcriptomics, proteomics, metabolomics, and cell–cell interactions at the level of an individual cell, as opposed to more conventional metho ...
*
Single-cell transcriptomics Single-cell transcriptomics examines the gene expression level of individual Cell (biology), cells in a given population by simultaneously measuring the RNA concentration (conventionally only messenger RNA (mRNA)) of hundreds to thousands of genes. ...
*
Single cell epigenomics Single cell epigenomics is the study of epigenomics (the complete set of Epigenetics, epigenetic modifications on the genetic material of a cell) in individual cells by single cell sequencing. Since 2013, methods have been created including whole ...
* Tcr-seq *
DNA sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, thymine, cytosine, and guanine. The ...
*
Whole genome sequencing Whole genome sequencing (WGS), also known as full genome sequencing or just genome sequencing, is the process of determining the entirety of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's ...


References


External links

* {{Breakthrough of the Year DNA sequencing Molecular biology techniques Biotechnology