Perturb-seq (also known as CRISP-seq and CROP-seq) refers to a high-throughput method of performing
single cell RNA sequencing (scRNA-seq) on pooled genetic perturbation screens.
Perturb-seq combines multiplexed
CRISPR
CRISPR (; acronym of clustered regularly interspaced short palindromic repeats) is a family of DNA sequences found in the genomes of prokaryotic organisms such as bacteria and archaea. Each sequence within an individual prokaryotic CRISPR is d ...
mediated gene inactivations with single cell RNA sequencing to assess comprehensive
gene expression
Gene expression is the process (including its Regulation of gene expression, regulation) by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, ...
phenotypes
In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology (physical form and structure), its developmental processes, its biochemical and physiological properti ...
for each perturbation. Inferring a gene’s function by applying genetic perturbations to
knock down or
knock out
A knockout (abbreviated to KO or K.O.) is a fight-ending, winning criterion in several Contact sports, full-contact combat sports, such as boxing, kickboxing, Muay Thai, mixed martial arts, karate, some forms of World Taekwondo Federation#Sparri ...
a gene and studying the resulting phenotype is known as
reverse genetics
Reverse genetics is a method in molecular genetics that is used to help understand the function(s) of a gene by analysing the phenotypic effects caused by genetically engineering specific nucleic acid sequences within the gene. The process proce ...
. Perturb-seq is a reverse genetics approach that allows for the investigation of
phenotype
In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology (physical form and structure), its developmental processes, its biochemical and physiological propert ...
s at the level of the
transcriptome
The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The ...
, to elucidate gene functions in many cells, in a massively parallel fashion.
The Perturb-seq protocol uses
CRISPR
CRISPR (; acronym of clustered regularly interspaced short palindromic repeats) is a family of DNA sequences found in the genomes of prokaryotic organisms such as bacteria and archaea. Each sequence within an individual prokaryotic CRISPR is d ...
technology to inactivate specific genes and
DNA barcoding
DNA barcoding is a method of species identification using a short section of DNA from a specific gene or genes. The premise of DNA barcoding is that by comparison with a reference library of such DNA sections (also called " sequences"), an indiv ...
of each guide RNA to allow for all perturbations to be pooled together and later deconvoluted, with assignment of each phenotype to a specific
guide RNA
Guide RNA (gRNA) or single guide RNA (sgRNA) is a short sequence of RNA that functions as a guide for the Cas9-endonuclease or other Cas-proteins that cut the double-stranded DNA and thereby can be used for gene editing. In bacteria and archaea, ...
.
Droplet-based
microfluidics
Microfluidics refers to a system that manipulates a small amount of fluids (10−9 to 10−18 liters) using small channels with sizes of ten to hundreds of micrometres. It is a multidisciplinary field that involves molecular analysis, molecular bi ...
platforms (or other cell sorting and separating techniques) are used to isolate individual cells, and then scRNA-seq is performed to generate
gene expression
Gene expression is the process (including its Regulation of gene expression, regulation) by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, ...
profiles for each cell. Upon completion of the protocol,
bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
analyses are conducted to associate each specific cell and perturbation with a transcriptomic profile that characterizes the consequences of inactivating each gene.
History
In the December 2016 issue of the
Cell
Cell most often refers to:
* Cell (biology), the functional basic unit of life
* Cellphone, a phone connected to a cellular network
* Clandestine cell, a penetration-resistant form of a secret or outlawed organization
* Electrochemical cell, a de ...
journal, two companion papers were published that each introduced and described this technique.
A third paper describing a conceptually similar approach (termed CRISP-seq) was also published in the same issue.
In October 2016, the CROP-seq method for single-cell CRISPR screening was presented in a preprint on
bioRxiv
bioRxiv (pronounced "bio-archive") is an open access preprint repository for the biological sciences co-founded by John Inglis and Richard Sever in November 2013. It was hosted by Cold Spring Harbor Laboratory (CSHL) until March 11, 2025, whe ...
and later published in the
Nature Methods
''Nature Methods'' is a monthly peer-reviewed scientific journal covering new scientific techniques. It was established in 2004 and is published by Springer Nature under the Nature Portfolio. Like other ''Nature'' journals, there is no external edi ...
journal.
While each paper shared the core principles of combining CRISPR mediated perturbation with scRNA-seq, their experimental, technological and analytical approaches differed in several aspects, to explore distinct biological questions, demonstrating the broad utility of this methodology. For example, the CRISPR-seq paper demonstrated the feasibility of ''in vivo'' studies using this technology, and the CROP-seq protocol facilitates large screens by providing a vector that makes the guide RNA itself readable (rather than relying on expressed barcodes), which allows for single-step guide RNA cloning. A June 2022 paper in ''Cell'' published results from one of the first genome-scale Perturb-seq screens, which uncovered new perturbations that promote chromosomal instability as well as variations in the expression of mitochondrially encoded transcripts in response to different forms of mitochondrial stress.
Experimental workflow
CRISPR Single Guide RNA Library design and selection

Pooled CRISPR
libraries
A library is a collection of Book, books, and possibly other Document, materials and Media (communication), media, that is accessible for use by its members and members of allied institutions. Libraries provide physical (hard copies) or electron ...
that enable gene inactivation can come in the form of either knockout or interference. Knockout libraries perturb genes through double stranded breaks that prompt the error prone
non-homologous end joining
Non-homologous end joining (NHEJ) is a pathway that repairs double-strand breaks in DNA. It is called "non-homologous" because the break ends are directly ligated without the need for a homologous template, in contrast to homology directed repair ...
repair pathway to introduce disruptive insertions or deletions.
CRISPR interference
CRISPR interference (CRISPRi) is a genetic perturbation technique that allows for sequence-specific repression of gene expression in prokaryotic and eukaryotic cells. It was first developed by Stanley Qi and colleagues in the laboratories of Wend ...
(CRISPRi) on the other hand utilizes a catalytically inactive
nuclease
In biochemistry, a nuclease (also archaically known as nucleodepolymerase or polynucleotidase) is an enzyme capable of cleaving the phosphodiester bonds that link nucleotides together to form nucleic acids. Nucleases variously affect single and ...
to physically block
RNA polymerase
In molecular biology, RNA polymerase (abbreviated RNAP or RNApol), or more specifically DNA-directed/dependent RNA polymerase (DdRP), is an enzyme that catalyzes the chemical reactions that synthesize RNA from a DNA template.
Using the e ...
, effectively preventing or halting
transcription
Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including:
Genetics
* Transcription (biology), the copying of DNA into RNA, often th ...
. Perturb-seq has been utilized with both the knockout and CRISPRi approaches in the Dixit et al. paper
and the Adamson et al. paper,
respectively.
Pooling all guide RNAs into a single screen relies on DNA barcodes that act as identifiers for each unique guide RNA. There are several commercially available pooled CRISPR libraries including the guide barcode library used in the study by Adamson et al.
CRISPR libraries can also be custom made using tools for sgRNA design, many of which are listed on the
CRISPR/cas9 tools Wikipedia page.
Lentiviral vectors
The sgRNA expression vector design will depend largely on the experiment performed but requires the following central components:
#
Promoter
#
Restriction sites
In molecular biology, restriction sites, or restriction recognition sites, are regions of a DNA molecule containing specific (4-8 base pairs in length) sequences of nucleotides; these are recognized by restriction enzymes, which cleave the DNA at ...
#
Primer
Primer may refer to:
Arts, entertainment, and media Films
* ''Primer'' (film), a 2004 feature film written and directed by Shane Carruth
* ''Primer'' (video), a documentary about the funk band Living Colour
Literature
* Primer (textbook), a te ...
Binding Sites
# sgRNA
# Guide Barcode
#
Reporter gene
Reporter genes are molecular tools widely used in molecular biology, genetics, and biotechnology to study gene function, expression patterns, and regulatory mechanisms. These genes encode proteins that produce easily detectable signals, such as ...
:
#* Fluorescent gene: vectors are often constructed to include a gene encoding a fluorescent protein, such that successfully transduced cells can be visually and quantitatively assessed by their expression.
#*
Antibiotic resistance
Antimicrobial resistance (AMR or AR) occurs when microbes evolve mechanisms that protect them from antimicrobials, which are drugs used to treat infections. This resistance affects all classes of microbes, including bacteria (antibiotic resis ...
gene: similar to fluorescent markers, antibiotic resistance genes are often incorporated into vectors to allow for selection of successfully transduced cells.
# CRISPR-associated endonuclease:
Cas9
Cas9 (CRISPR associated protein 9, formerly called Cas5, Csn1, or Csx12) is a 160 dalton (unit), kilodalton protein which plays a vital role in the immunological defense of certain bacteria against DNA viruses and plasmids, and is heavily utili ...
or other CRISPR-associated endonucleases such as
Cpf1
Cas12a (CRISPR-associated protein 12a, previously known as Cpf1) is an RNA-guided endonuclease-exonuclease that forms an essential component of the CRISPR systems found in some bacteria and archaea. In its natural context, Cas12a targets and des ...
must be introduced to cells that do not endogenously express them. Due to the large size of these genes, a two-vector system can be used to express the endonuclease separately from the sgRNA expression vector.
Transduction and selection
Cells are typically
transduced with a
Multiplicity of Infection (MOI) of 0.4 to 0.6
lentiviral particles per cell to maximize the likelihood of obtaining the most cells which contain a single guide RNA.
If the effects of simultaneous perturbations are of interest, a higher MOI may be applied to increase the amount of transduced cells with more than one guide RNA. Selection for successfully transduced cells is then performed using a fluorescence assay or an antibiotic assay, depending on the reporter gene used in the expression vector.
Single-cell library preparation
After successfully transduced cells have been selected for, isolation of single cells is needed to conduct scRNA-seq. Perturb-seq and CROP-seq have been performed using droplet-based technology for single cell isolation,
while the closely related CRISP-seq was performed with a microwell-based approach.
Once cells have been isolated at the single cell level,
reverse transcription
A reverse transcriptase (RT) is an enzyme used to convert RNA genome to DNA, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B virus, hepatitis B to replicate their genomes, by retrot ...
, amplification and sequencing takes place to produce gene expression profiles for each cell. Many scRNA-seq approaches incorporate
unique molecular identifiers Unique molecular identifiers (UMIs), or molecular barcodes (MBC) are short sequences or molecular "tags" added to DNA fragments in some next generation sequencing library preparation protocols to identify the input DNA molecule. These tags are added ...
(UMIs) and cell barcodes during the reverse transcription step to index individual RNA molecules and cells, respectively. These additional barcodes serve to help quantify RNA transcripts and to associate each of the sequences with their cell of origin.
Bioinformatics analysis
Read alignment and processing are performed to map quality reads to a reference genome. Deconvolution of cell barcodes, guide barcodes and UMIs enables the association of guide RNAs with the cells that contain them, thus allowing the gene expression profile of each cell to be affiliated with a particular perturbation. Further downstream analyses on the transcriptional profiles will depend entirely on the biological question of interest.
T-distributed Stochastic Neighbor Embedding (t-SNE) is a commonly used
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
algorithm to visualize the high-dimensional data that results from scRNA-seq in a 2-dimensional scatterplot.
The authors who first performed Perturb-seq developed an in-house computational framework called MIMOSCA that predicts the effects of each perturbation using a linear model and is available on an open software repository.
Advantages and limitations
Perturb-seq makes use of current technologies in molecular biology to integrate a multi-step workflow that couples high-throughput screening with complex phenotypic outputs. When compared to alternative methods used for gene knockdowns or knockouts, such as
RNAi
RNA interference (RNAi) is a biological process in which RNA molecules are involved in sequence-specific suppression of gene expression by double-stranded RNA, through translational or transcriptional repression. Historically, RNAi was known b ...
,
zinc finger nuclease
Zinc-finger nucleases (ZFNs) are artificial restriction enzymes generated by fusing a zinc finger DNA-binding domain to a nuclease, DNA-cleavage domain. Zinc finger domains can be engineered to target specific desired DNA sequences and this enab ...
s or
transcription activator-like effector nuclease
Transcription activator-like effector nucleases (TALEN) are restriction enzymes that can be engineered to cut specific sequences of DNA. They are made by fusing a TAL effector DNA-binding domain to a DNA cleavage domain (a nuclease which cuts DN ...
s (TALENs), the application of CRISPR-based perturbations enables more specificity, efficiency and ease of use.
Another advantage of this protocol is that while most screening approaches can only assay for simple phenotypes, such as cellular viability, scRNA-seq allows for a much richer phenotypic readout, with quantitative measurements of gene expression in many cells simultaneously. Perturb-seq can therefore combine the high throughput of
forward genetics
Forward genetics is a molecular genetics approach of determining the genetic basis responsible for a phenotype. Forward genetics provides an unbiased approach because it relies heavily on identifying the genes or genetic factors that cause a partic ...
, in terms of the number of genetic perturbations, with the rich phenotype dimension of
reverse genetics
Reverse genetics is a method in molecular genetics that is used to help understand the function(s) of a gene by analysing the phenotypic effects caused by genetically engineering specific nucleic acid sequences within the gene. The process proce ...
.
However, while a large and comprehensive amount of data can be a benefit, it can also present a major challenge. Single cell RNA expression readouts are known to produce ‘noisy’ data, with a significant number of false positives. Both the large size and noise that is associated with scRNA-seq will likely require new and powerful computational methods and bioinformatics pipelines to better make sense of the resulting data. Another challenge associated with this protocol is the creation of large scale CRISPR libraries. The preparation of these extensive libraries depends upon a comparative increase in the resources required to culture the massive numbers of cells that are needed to achieve a successful screen of many perturbations.
In parallel to these single-cell methods, other approaches have been developed to reconstruct genetic pathways using whole-organism RNA-sequencing. These methods use a single aggregate statistic, called the transcriptome-wide epistasis coefficient, to guide pathway reconstruction. In contrast with the statistical framework of the methods described above, this coefficient may be more robust to noise and is intuitively interpretable in terms of Batesonian epistasis. This approach was used to identify a new state in the life cycle of the nematode ''C. elegans''.
When the phenotyping of the pooled library is performed with microcopy, rather than RNA sequencing, the method is referred to as
optical pooled screening, OPS. This allows investigating spatial, morphological, or dynamic cellular features that are not easily captured by RNA sequencing. Also OPS uses
CRISPR
CRISPR (; acronym of clustered regularly interspaced short palindromic repeats) is a family of DNA sequences found in the genomes of prokaryotic organisms such as bacteria and archaea. Each sequence within an individual prokaryotic CRISPR is d ...
technology to inactivate specific genes and
DNA barcoding
DNA barcoding is a method of species identification using a short section of DNA from a specific gene or genes. The premise of DNA barcoding is that by comparison with a reference library of such DNA sections (also called " sequences"), an indiv ...
of each guide RNA to allow for all perturbations to be pooled together for phenotyping and later deconvoluted, with assignment of each phenotype to a specific
guide RNA
Guide RNA (gRNA) or single guide RNA (sgRNA) is a short sequence of RNA that functions as a guide for the Cas9-endonuclease or other Cas-proteins that cut the double-stranded DNA and thereby can be used for gene editing. In bacteria and archaea, ...
. In OPS the barcodes are read out by in situ sequencing.
Applications
Perturb-seq or other conceptually similar protocols can be used to address a broad scope of biological questions and the applications of this technology will likely grow over time. Three papers on this topic, published in the December 2016 issue of the Journal Cell, demonstrated the utility of this method by applying it to the investigation of several distinct biological functions. In the paper, “Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens”, the authors used Perturb-seq to conduct knockouts of
transcription factors
In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The fun ...
related to the
immune response
An immune response is a physiological reaction which occurs within an organism in the context of inflammation for the purpose of defending against exogenous factors. These include a wide variety of different toxins, viruses, intra- and extracellula ...
in hundreds of thousands of cells to investigate the cellular consequences of their inactivation.
They also explored the effects of transcription factors on cell states in the context of the
cell cycle
The cell cycle, or cell-division cycle, is the sequential series of events that take place in a cell (biology), cell that causes it to divide into two daughter cells. These events include the growth of the cell, duplication of its DNA (DNA re ...
. In the study led by
UCSF
The University of California, San Francisco (UCSF) is a public land-grant research university in San Francisco, California, United States. It is part of the University of California system and is dedicated entirely to health science and life ...
, “A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response” the researchers suppressed multiple genes in each cell to study the
unfolded protein response The unfolded protein response (UPR) is a cellular stress response related to the endoplasmic reticulum (ER) stress. It has been found to be conserved between mammalian species, as well as yeast and worm organisms.
The UPR is activated in response t ...
(UPR) pathway.
With a similar methodology, but using the term CRISP-seq instead of Perturb-seq, the paper "Dissecting Immune Circuits by Linking CRISPR-Pooled Screens with Single-Cell RNA-Seq" performed a proof of concept experiment by using the technique to probe regulatory pathways related to
innate immunity
The innate immune system or nonspecific immune system is one of the two main immunity strategies in vertebrates (the other being the adaptive immune system). The innate immune system is an alternate defense strategy and is the dominant immune s ...
in mice.
Lethality of each perturbation and
epistasis
Epistasis is a phenomenon in genetics in which the effect of a gene mutation is dependent on the presence or absence of mutations in one or more other genes, respectively termed modifier genes. In other words, the effect of the mutation is depe ...
analyses in cells with multiple perturbations was also investigated in these papers. Perturb-seq has so far been used with very few perturbations per experiment, but it can theoretically be scaled up to address the whole genome. Finally, the October 2016 preprint
and subsequent paper
demonstrate the bioinformatic reconstruction of the T cell receptor signaling pathway in
Jurkat
Jurkat cells are an immortalized line of human T lymphocyte cells that are used to study acute T cell leukemia, T cell signaling, and the expression of various chemokine receptors susceptible to viral entry, particularly HIV. Jurkat cells can, up ...
cells based on CROP-seq data.
Recently, the Perturb-seq (CROP-seq) workflow has been adapted to enable genome-scale
CRISPRi (CRISPR interference) screens in Jurkat cells at single-cell resolution.
The first-of-its-kind genome-scale CRISPRi screen was conducted to verify factors involved in TCR signaling pathways. In more detail, a guide RNA library targeting 18,595 human genes was utilized for CRISPR-based gene knockdowns in Jurkat cells expressing the dCas9-
KRAB
Krab or KRAB may refer to:
* Krab (surname)
* Crab stick, or krab, processed seafood
* Russian submarine ''Krab''
* AHS Krab, a Polish 155 mm self-propelled howitzer
* KRAB, a radio station in Greenacres, California, U.S.
* KRAB, a former r ...
fusion endonuclease. In total, one million Jurkat cells were processed for single-cell
RNA sequencing
RNA-Seq (named as an abbreviation of RNA sequencing) is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also kn ...
allowing transcriptomic readouts of a final list of 374 marker genes involved in TCR signaling. The bioinformatic analysis confirmed more than 70 known activators and repressors of TCR signaling cascades, hence showcasing the potential of Perturb-seq (CROP-seq) screens to support translational research.
While these publications used these protocols for answering complex biological questions, this technology can also be used as a validation assay to ensure the specificity of any CRISPR based knockdown or knockout; the expression levels of the target genes as well as others can be measured with single cell resolution in parallel, to detect whether the perturbation was successful and to assess the experiment for off target effects. Furthermore, these protocols make it possible to perform perturbation screens in heterogeneous tissues, while obtaining cell type specific gene expression responses.
References
{{Reflist, 30em, refs=
RNA sequencing
Genomics
Bioinformatics
Molecular biology techniques