
CRISPR (; acronym of clustered regularly interspaced short palindromic repeats) is a family of
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
sequences found in the
genome
A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
s of
prokaryotic
A prokaryote (; less commonly spelled procaryote) is a single-celled organism whose cell lacks a nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Ancient Greek (), meaning 'before', and (), meaning 'nut' ...
organisms such as
bacteria
Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
and
archaea
Archaea ( ) is a Domain (biology), domain of organisms. Traditionally, Archaea only included its Prokaryote, prokaryotic members, but this has since been found to be paraphyletic, as eukaryotes are known to have evolved from archaea. Even thou ...
.
Each sequence within an individual prokaryotic CRISPR is derived from a DNA fragment of a
bacteriophage
A bacteriophage (), also known informally as a phage (), is a virus that infects and replicates within bacteria. The term is derived . Bacteriophages are composed of proteins that Capsid, encapsulate a DNA or RNA genome, and may have structu ...
that had previously infected the prokaryote or one of its ancestors.
These sequences are used to detect and destroy DNA from similar bacteriophages during subsequent infections. Hence these sequences play a key role in the antiviral (i.e. anti-
phage
A bacteriophage (), also known informally as a phage (), is a virus that infects and replicates within bacteria. The term is derived . Bacteriophages are composed of proteins that encapsulate a DNA or RNA genome, and may have structures tha ...
) defense system of prokaryotes and provide a form of heritable,
acquired immunity
The adaptive immune system (AIS), also known as the acquired immune system, or specific immune system is a subsystem of the immune system that is composed of specialized cells, organs, and processes that eliminate pathogens specifically. The ac ...
.
[ ] CRISPR is found in approximately 50% of sequenced
bacterial genomes and nearly 90% of sequenced archaea.
Cas9
Cas9 (CRISPR associated protein 9, formerly called Cas5, Csn1, or Csx12) is a 160 dalton (unit), kilodalton protein which plays a vital role in the immunological defense of certain bacteria against DNA viruses and plasmids, and is heavily utili ...
(or "CRISPR-associated protein 9") is an
enzyme
An enzyme () is a protein that acts as a biological catalyst by accelerating chemical reactions. The molecules upon which enzymes may act are called substrate (chemistry), substrates, and the enzyme converts the substrates into different mol ...
that uses CRISPR sequences as a guide to recognize and open up specific strands of DNA that are complementary to the CRISPR sequence. Cas9 enzymes together with CRISPR sequences form the basis of a technology known as
CRISPR-Cas9 that can be used to edit genes within living organisms.
This editing process has a wide variety of applications including basic
biological
Biology is the scientific study of life and living organisms. It is a broad natural science that encompasses a wide range of fields and unifying principles that explain the structure, function, growth, origin, evolution, and distribution of ...
research, development of
biotechnological
Biotechnology is a multidisciplinary field that involves the integration of natural sciences and engineering sciences in order to achieve the application of organisms and parts thereof for products and services. Specialists in the field are kn ...
products, and treatment of diseases.
[CRISPR-CAS9, TALENS and ZFNS – the battle in gene editing https://www.ptglab.com/news/blog/crispr-cas9-talens-and-zfns-the-battle-in-gene-editing/ ] The development of the CRISPR-Cas9 genome editing technique was recognized by the
Nobel Prize in Chemistry
The Nobel Prize in Chemistry () is awarded annually by the Royal Swedish Academy of Sciences to scientists in the various fields of chemistry. It is one of the five Nobel Prizes established by the will of Alfred Nobel in 1895, awarded for outst ...
in 2020 awarded to
Emmanuelle Charpentier
Emmanuelle Marie Charpentier (; born 11 December 1968) is a French professor and researcher in microbiology, genetics, and biochemistry. As of 2015, she has been a director at the Max Planck Institute for Infection Biology in Berlin. In 2018, sh ...
and
Jennifer Doudna
Jennifer Anne Doudna (; born February 19, 1964) is an American biochemist who has pioneered work in CRISPR gene editing, and made other fundamental contributions in biochemistry and genetics. She received the 2020 Nobel Prize in Chemistry, wit ...
.
History
Repeated sequences
The discovery of clustered DNA repeats took place independently in three parts of the world. The first description of what would later be called CRISPR is from
Osaka University
The , abbreviated as UOsaka or , is a List of national universities in Japan, national research university in Osaka, Japan. The university traces its roots back to Edo period, Edo-era institutions Tekijuku (1838) and Kaitokudō, Kaitokudo (1724), ...
researcher
Yoshizumi Ishino
is a Japanese molecular biologist, known for discovering the DNA sequence of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR).
Biography
Ishino was born in Kyoto Prefecture, Japan. He received his BS, MS and PhD in 1981, 1 ...
and his colleagues in 1987. They accidentally cloned part of a CRISPR sequence together with the "''iap" gene'' ''(isozyme conversion of alkaline phosphatase)'' from their target genome, that of ''
Escherichia coli
''Escherichia coli'' ( )Wells, J. C. (2000) Longman Pronunciation Dictionary. Harlow ngland Pearson Education Ltd. is a gram-negative, facultative anaerobic, rod-shaped, coliform bacterium of the genus '' Escherichia'' that is commonly fo ...
''.
The organization of the repeats was unusual. Repeated sequences are typically arranged consecutively, without interspersing different sequences.
They did not know the function of the interrupted clustered repeats.
In 1993, researchers of ''
Mycobacterium tuberculosis
''Mycobacterium tuberculosis'' (M. tb), also known as Koch's bacillus, is a species of pathogenic bacteria in the family Mycobacteriaceae and the causative agent of tuberculosis.
First discovered in 1882 by Robert Koch, ''M. tuberculosis'' ha ...
'' in the Netherlands published two articles about a cluster of interrupted
direct repeats (DR) in that bacterium. They recognized the diversity of the sequences that intervened in the direct repeats among different strains of ''M. tuberculosis''
and used this property to design a typing method called ''
spoligotyping'', still in use today.
Francisco Mojica
Francisco Juan Martínez Mojica (born 5 October 1963) is a Spanish molecular biologist and microbiologist at the University of Alicante in Spain. He is known for his discovery of repetitive, functional DNA sequences in bacteria which he named CR ...
at the
University of Alicante
The University of Alicante (, ; , ; also known by the acronym ''UA'') was established in 1979 on the basis of the Center for University Studies (CEU), which was founded in 1968. The university main campus is located in San Vicente del Raspeig/San ...
in Spain studied the function of repeats in the archaeal species ''
Haloferax'' and ''
Haloarcula
''Haloarcula'' (common abbreviation ''Har.'') is a genus of extreme halophilic Archaeon, Archaea in the class of Halobacteria.
Cell structure
''Haloarcula'' species can be distinguished from other genera in the family Halobacteriaceae by the pre ...
''. Mojica's supervisor surmised that the clustered repeats had a role in correctly segregating replicated DNA into daughter cells during cell division, because plasmids and chromosomes with identical repeat arrays could not coexist in ''
Haloferax volcanii''. Transcription of the interrupted repeats was also noted for the first time; this was the first full characterization of CRISPR.
By 2000, Mojica and his students, after an automated search of published genomes, identified interrupted repeats in 20 species of microbes as belonging to the same family.
Because those sequences were interspaced, Mojica initially called these sequences "short regularly spaced repeats" (SRSR).
In 2001, Mojica and
Ruud Jansen, who were searching for additional interrupted repeats, proposed the acronym CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) to unify the numerous acronyms used to describe these sequences.
In 2002, Tang, et al. showed evidence that CRISPR repeat regions from the genome of ''
Archaeoglobus fulgidus'' were transcribed into long RNA molecules subsequently processed into unit-length small RNAs, plus some longer forms of 2, 3, or more spacer-repeat units.
In 2005,
yogurt
Yogurt (; , from , ; also spelled yoghurt, yogourt or yoghourt) is a food produced by bacterial Fermentation (food), fermentation of milk. Fermentation of sugars in the milk by these bacteria produces lactic acid, which acts on milk protein to ...
researcher
Rodolphe Barrangou discovered that ''
Streptococcus thermophilus
''Streptococcus thermophilus'' formerly known as ''Streptococcus salivarius ''subsp.'' thermophilus'' is a gram-positive bacteria, gram-positive bacterium, and a lactic acid fermentation, fermentative facultative anaerobic organism, facultative ...
'', after iterative phage infection challenges, develops increased phage resistance due to the incorporation of additional CRISPR spacer sequences.
Barrangou's employer, the Danish food company Danisco, then developed phage-resistant ''S. thermophilus'' strains for yogurt production. Danisco was later bought by
DuPont
Dupont, DuPont, Du Pont, duPont, or du Pont may refer to:
People
* Dupont (surname) Dupont, also spelled as DuPont, duPont, Du Pont, or du Pont is a French surname meaning "of the bridge", historically indicating that the holder of the surname re ...
, which owns about 50 percent of the global dairy culture market, and the technology spread widely.
CRISPR-associated systems
A major advance in understanding CRISPR came with Jansen's observation that the prokaryote repeat cluster was accompanied by four homologous genes that make up CRISPR-associated systems, ''cas'' 1–4. The Cas proteins showed
helicase
Helicases are a class of enzymes that are vital to all organisms. Their main function is to unpack an organism's genetic material. Helicases are motor proteins that move directionally along a nucleic double helix, separating the two hybridized ...
and
nuclease
In biochemistry, a nuclease (also archaically known as nucleodepolymerase or polynucleotidase) is an enzyme capable of cleaving the phosphodiester bonds that link nucleotides together to form nucleic acids. Nucleases variously affect single and ...
motifs, suggesting a role in the dynamic structure of the CRISPR loci.
In this publication, the acronym CRISPR was used as the universal name of this pattern, but its function remained enigmatic.

In 2005, three independent research groups showed that some CRISPR spacers are derived from
phage
A bacteriophage (), also known informally as a phage (), is a virus that infects and replicates within bacteria. The term is derived . Bacteriophages are composed of proteins that encapsulate a DNA or RNA genome, and may have structures tha ...
DNA and
extrachromosomal DNA
Extrachromosomal DNA (abbreviated ecDNA) is any DNA that is found off the chromosomes, either inside or outside the nucleus of a cell. Most DNA in an individual genome is found in chromosomes contained in the nucleus. Multiple forms of extrachrom ...
such as
plasmid
A plasmid is a small, extrachromosomal DNA molecule within a cell that is physically separated from chromosomal DNA and can replicate independently. They are most commonly found as small circular, double-stranded DNA molecules in bacteria and ...
s.
In effect, the spacers are fragments of DNA gathered from viruses that previously attacked the cell. The source of the spacers was a sign that the CRISPR-''cas'' system could have a role in adaptive immunity in
bacteria
Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
.
All three studies proposing this idea were initially rejected by high-profile journals, but eventually appeared in other journals.
The first publication
proposing a role of CRISPR-Cas in microbial immunity, by Mojica and collaborators at the
University of Alicante
The University of Alicante (, ; , ; also known by the acronym ''UA'') was established in 1979 on the basis of the Center for University Studies (CEU), which was founded in 1968. The university main campus is located in San Vicente del Raspeig/San ...
, predicted a role for the RNA transcript of spacers on target recognition in a mechanism that could be analogous to the
RNA interference
RNA interference (RNAi) is a biological process in which RNA molecules are involved in sequence-specific suppression of gene expression by double-stranded RNA, through translational or transcriptional repression. Historically, RNAi was known by ...
system used by eukaryotic cells. Koonin and colleagues extended this RNA interference hypothesis by proposing mechanisms of action for the different CRISPR-Cas subtypes according to the predicted function of their proteins.
Experimental work by several groups revealed the basic mechanisms of CRISPR-Cas immunity. In 2007, the first experimental evidence that CRISPR was an adaptive immune system was published.
A CRISPR region in ''
Streptococcus thermophilus
''Streptococcus thermophilus'' formerly known as ''Streptococcus salivarius ''subsp.'' thermophilus'' is a gram-positive bacteria, gram-positive bacterium, and a lactic acid fermentation, fermentative facultative anaerobic organism, facultative ...
'' acquired spacers from the DNA of an infecting
bacteriophage
A bacteriophage (), also known informally as a phage (), is a virus that infects and replicates within bacteria. The term is derived . Bacteriophages are composed of proteins that Capsid, encapsulate a DNA or RNA genome, and may have structu ...
. The researchers manipulated the resistance of ''S. thermophilus'' to different types of phages by adding and deleting spacers whose sequence matched those found in the tested phages.
In 2008, Brouns and Van der Oost identified a complex of Cas proteins called Cascade, that in ''E. coli'' cut the CRISPR RNA precursor within the repeats into mature spacer-containing RNA molecules called
CRISPR RNA
CRISPR RNA or crRNA is a RNA transcript from the CRISPR locus. CRISPR-Cas (clustered, regularly interspaced short palindromic repeats - CRISPR associated systems) is an adaptive immune system found in bacteria and archaea to protect against mobi ...
(crRNA), which remained bound to the protein complex.
Moreover, it was found that Cascade, crRNA and a helicase/nuclease (
Cas3
Cas3 is an ATP-dependent single-strand DNA (ssDNA) translocase/helicase enzyme that degrades DNA as part of CRISPR based immunity.
Cas3 is a "signature" protein of class 1 CRISPR systems and functions in a complex known as CASCADE, with other ...
) were required to provide a bacterial host with immunity against infection by a
DNA virus
A DNA virus is a virus that has a genome made of deoxyribonucleic acid (DNA) that is replicated by a DNA polymerase. They can be divided between those that have two strands of DNA in their genome, called double-stranded DNA (dsDNA) viruses, and t ...
. By designing an anti-virus CRISPR, they demonstrated that two orientations of the crRNA (sense/antisense) provided immunity, indicating that the crRNA guides were targeting
dsDNA. That year Marraffini and Sontheimer confirmed that a CRISPR sequence of ''
S. epidermidis'' targeted DNA and not RNA to prevent
conjugation
Conjugation or conjugate may refer to:
Linguistics
*Grammatical conjugation, the modification of a verb from its basic form
*Emotive conjugation or Russell's conjugation, the use of loaded language
Mathematics
*Complex conjugation, the change o ...
. This finding was at odds with the proposed RNA-interference-like mechanism of CRISPR-Cas immunity, although a CRISPR-Cas system that targets foreign RNA was later found in ''
Pyrococcus furiosus''.
A 2010 study showed that CRISPR-Cas cuts strands of both phage and plasmid DNA in ''S. thermophilus''.
Cas9

A simpler CRISPR system from ''
Streptococcus pyogenes
''Streptococcus pyogenes'' is a species of Gram-positive, aerotolerant bacteria in the genus '' Streptococcus''. These bacteria are extracellular, and made up of non-motile and non-sporing cocci (round cells) that tend to link in chains. They ...
'' uses the protein
Cas9
Cas9 (CRISPR associated protein 9, formerly called Cas5, Csn1, or Csx12) is a 160 dalton (unit), kilodalton protein which plays a vital role in the immunological defense of certain bacteria against DNA viruses and plasmids, and is heavily utili ...
, an endonuclease functioning with two small RNAs—crRNA and tracrRNA—to form a four-component complex.
In 2012,
Jennifer Doudna
Jennifer Anne Doudna (; born February 19, 1964) is an American biochemist who has pioneered work in CRISPR gene editing, and made other fundamental contributions in biochemistry and genetics. She received the 2020 Nobel Prize in Chemistry, wit ...
and
Emmanuelle Charpentier
Emmanuelle Marie Charpentier (; born 11 December 1968) is a French professor and researcher in microbiology, genetics, and biochemistry. As of 2015, she has been a director at the Max Planck Institute for Infection Biology in Berlin. In 2018, sh ...
simplified this into a two-component system by fusing the RNAs into a "
single-guide RNA", enabling Cas9 to target and cut specific DNA sequences—a breakthrough that earned them the
Nobel Prize in Chemistry
The Nobel Prize in Chemistry () is awarded annually by the Royal Swedish Academy of Sciences to scientists in the various fields of chemistry. It is one of the five Nobel Prizes established by the will of Alfred Nobel in 1895, awarded for outst ...
in 2020.
Parallel work showed the ''S. thermophilus'' Cas9 could be similarly reprogrammed by altering the crRNA sequence.
These developments spurred genome editing efforts, including demonstrations by groups led by
Feng Zhang and
George Church showing genome editing in human cells using CRISPR-Cas9.
Cas12a
Cas12a, a Class II Type V CRISPR-associated nuclease, was characterized in 2015 and was formerly known as Cpf1.
This nuclease is found in the CRISPR-Cpf1 system of bacteria such as ''Francisella novicida''.
The initial designation, derived from a TIGRFAMs protein family definition established in 2012, reflected the prevalence of this CRISPR-Cas subtype in the ''Prevotella'' and ''Francisella'' lineages. Cas12a exhibits several key distinctions from Cas9: it generates staggered cuts in double-stranded DNA, in contrast to the blunt ends produced by Cas9;
[CRISPR-CAS9, TALENS and ZFNS – the battle in gene editing https://www.ptglab.com/news/blog/crispr-cas9-talens-and-zfns-the-battle-in-gene-editing/ ] it relies on a 'T-rich' protospacer adjacent motif (PAM) (typically 5'-TTTV-3', where V is A, C, or G), offering alternative targeting sites compared to the 'G-rich' PAMs (typically 5'-NGG-3') favored by Cas9;
and it requires only a CRISPR RNA (crRNA) for effective targeting, whereas Cas9 necessitates both a crRNA and a ''trans''-activating crRNA (tracrRNA).
Cas13a
In 2016, the nuclease (formerly known as C2c2) from the bacterium ''Leptotrichia shahii'' was characterized by researchers in
Feng Zhang's group at
MIT
The Massachusetts Institute of Technology (MIT) is a private research university in Cambridge, Massachusetts, United States. Established in 1861, MIT has played a significant role in the development of many areas of modern technology and sc ...
and the
Broad Institute
The Eli and Edythe L. Broad Institute of MIT and Harvard (IPA: , pronunciation respelling: ), often referred to as the Broad Institute, is a biomedical and genomic research center located in Cambridge, Massachusetts, United States. The institu ...
. Cas13 is an RNA-guided RNA endonuclease, which means that it does not cleave DNA, but only single-stranded RNA. Cas13 is guided by its crRNA to a ssRNA target and binds and cleaves the target. Similar to Cas12a, the Cas13 remains bound to the target and then cleaves other ssRNA molecules non-discriminately. This collateral cleavage property has been exploited for the development of various diagnostic technologies.
Locus structure
Repeats and spacers
The CRISPR array is made up of an AT-rich leader sequence followed by short repeats that are separated by unique spacers.
CRISPR repeats typically range in size from 28 to 37
base pair
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
s (bps), though there can be as few as 23 bp and as many as 55 bp.
Some show
dyad symmetry
Dyad or dyade may refer to:
Arts and entertainment
* Dyad (music), a set of two notes or pitches
* Dyad (novel), ''Dyad'' (novel), by Michael Brodsky, 1989
* Dyad (video game), ''Dyad'' (video game), 2012
* ''Dyad 1909'' and ''Dyad 1929'', ballet ...
, implying the formation of a
secondary structure
Protein secondary structure is the local spatial conformation of the polypeptide backbone excluding the side chains. The two most common Protein structure#Secondary structure, secondary structural elements are alpha helix, alpha helices and beta ...
such as a
stem-loop
Stem-loops are nucleic acid Biomolecular structure, secondary structural elements which form via intramolecular base pairing in single-stranded DNA or RNA. They are also referred to as hairpins or hairpin loops. A stem-loop occurs when two regi ...
('hairpin') in the RNA, while others are designed to be unstructured. The size of spacers in different CRISPR arrays is typically 32 to 38 bp (range 21 to 72 bp).
New spacers can appear rapidly as part of the immune response to phage infection.
There are usually fewer than 50 units of the repeat-spacer sequence in a CRISPR array.
CRISPR RNA structures
Image:RF01315.png, CRISPR-DR2: Secondary structure taken from th
Rfam
database. Famil
RF01315
Image:RF01318.png, CRISPR-DR5: Secondary structure taken from th
Rfam
database. Famil
RF011318
Image:RF01319.png, CRISPR-DR6: Secondary structure taken from th
Rfam
database. Famil
RF01319
Image:RF01321.png, CRISPR-DR8: Secondary structure taken from th
Rfam
database. Famil
RF01321
Image:RF01322.png, CRISPR-DR9: Secondary structure taken from th
Rfam
database. Famil
RF01322
Image:RF01332.png, CRISPR-DR19: Secondary structure taken from th
Rfam
database. Famil
RF01332
Image:RF01350.png, CRISPR-DR41: Secondary structure taken from th
Rfam
database. Famil
RF01350
Image:RF01365.png, CRISPR-DR52: Secondary structure taken from th
Rfam
database. Famil
RF01365
Image:RF01370.png, CRISPR-DR57: Secondary structure taken from th
Rfam
database. Famil
RF01370
Image:RF01378.png, CRISPR-DR65: Secondary structure taken from th
Rfam
database. Famil
RF01378
Cas genes and CRISPR subtypes
Small clusters of ''cas'' genes are often located next to CRISPR repeat-spacer arrays. Collectively the 93 ''cas'' genes are grouped into 35 families based on sequence similarity of the encoded proteins. 11 of the 35 families form the ''cas'' core, which includes the protein families Cas1 through Cas9. A complete CRISPR-Cas locus has at least one gene belonging to the ''cas'' core.
CRISPR-Cas systems fall into two classes. Class 1 systems use a complex of multiple Cas proteins to degrade foreign nucleic acids. Class 2 systems use a single large Cas protein for the same purpose. Class 1 is divided into types I, III, and IV; class 2 is divided into types II, V, and VI.
The 6 system types are divided into 33 subtypes.
Each type and most subtypes are characterized by a "signature gene" found almost exclusively in the category. Classification is also based on the complement of ''cas'' genes that are present. Most CRISPR-Cas systems have a Cas1 protein. The
phylogeny
A phylogenetic tree or phylogeny is a graphical representation which shows the evolutionary history between a set of species or Taxon, taxa during a specific time.Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, M ...
of Cas1 proteins generally agrees with the classification system,
but exceptions exist due to module shuffling.
Many organisms contain multiple CRISPR-Cas systems suggesting that they are compatible and may share components.
The sporadic distribution of the CRISPR-Cas subtypes suggests that the CRISPR-Cas system is subject to
horizontal gene transfer
Horizontal gene transfer (HGT) or lateral gene transfer (LGT) is the movement of genetic material between organisms other than by the ("vertical") transmission of DNA from parent to offspring (reproduction). HGT is an important factor in the e ...
during microbial
evolution
Evolution is the change in the heritable Phenotypic trait, characteristics of biological populations over successive generations. It occurs when evolutionary processes such as natural selection and genetic drift act on genetic variation, re ...
.
Mechanism

CRISPR-Cas immunity is a natural process of bacteria and archaea.
CRISPR-Cas prevents bacteriophage infection,
conjugation
Conjugation or conjugate may refer to:
Linguistics
*Grammatical conjugation, the modification of a verb from its basic form
*Emotive conjugation or Russell's conjugation, the use of loaded language
Mathematics
*Complex conjugation, the change o ...
and
natural transformation
In category theory, a branch of mathematics, a natural transformation provides a way of transforming one functor into another while respecting the internal structure (i.e., the composition of morphisms) of the categories involved. Hence, a natur ...
by degrading foreign nucleic acids that enter the cell.
Spacer acquisition
When a
microbe
A microorganism, or microbe, is an organism of microscopic size, which may exist in its single-celled form or as a colony of cells. The possible existence of unseen microbial life was suspected from antiquity, with an early attestation in ...
is invaded by a
bacteriophage
A bacteriophage (), also known informally as a phage (), is a virus that infects and replicates within bacteria. The term is derived . Bacteriophages are composed of proteins that Capsid, encapsulate a DNA or RNA genome, and may have structu ...
, the first stage of the immune response is to capture phage DNA and insert it into a CRISPR locus in the form of a spacer.
Cas1
CRISPR-associated protein 1 (cas1) is one of the two universally conserved proteins found in the CRISPR prokaryotic immune defense system. Cas1 is a metal-dependent DNA-specific endonuclease that produces double-stranded DNA fragments. Cas1 forms ...
and
Cas2
Cas2 is a protein associated with CRISPR that is involved with spacer acquisition. Representative cas2 proteins have been characterized as endonucleases that cleave single-stranded RNAs preferentially within U-rich regions, or as metal-dependen ...
are found in both types of CRISPR-Cas immune systems, which indicates that they are involved in spacer acquisition. Mutation studies confirmed this hypothesis, showing that removal of Cas1 or Cas2 stopped spacer acquisition, without affecting CRISPR immune response.
Multiple Cas1 proteins have been characterised and their structures resolved.
Cas1 proteins have diverse
amino acid
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although over 500 amino acids exist in nature, by far the most important are the 22 α-amino acids incorporated into proteins. Only these 22 a ...
sequences. However, their crystal structures are similar and all purified Cas1 proteins are metal-dependent nucleases/
integrases
Retroviral integrase (IN) is an enzyme produced by a retrovirus (such as HIV) that Retroviral integration, integrates (forms Covalent bond, covalent links between) its genetic information into that of the host Cell (biology), cell it infects. Re ...
that bind to DNA in a sequence-independent manner.
Representative Cas2 proteins have been characterised and possess either (single strand) ssRNA-
or (double strand) dsDNA-
specific
endoribonuclease
In biochemistry, an endoribonuclease is a class of enzyme which is a type of ribonuclease (an RNA cleaver), itself a type of endonuclease (a nucleotide cleaver). It cleaves either single-stranded or double-stranded RNA, depending on the enzyme. Ex ...
activity.
In the I-E system of ''E. coli'', Cas1 and Cas2 form a complex where a Cas2 dimer bridges two Cas1 dimers.
In this complex, Cas2 performs a non-enzymatic scaffolding role,
binding double-stranded fragments of invading DNA, while Cas1 binds the single-stranded flanks of the DNA and catalyses their integration into CRISPR arrays.
New spacers are usually added at the beginning of the CRISPR next to the leader sequence creating a chronological record of viral infections. In ''E. coli'' a
histone like protein called integration host factor (
IHF), which binds to the leader sequence, is responsible for the accuracy of this integration. IHF also enhances integration efficiency in the type I-F system of ''
Pectobacterium atrosepticum,'' but in other systems, different host factors may be required
Protospacer adjacent motifs (PAM)
Bioinformatic analysis of regions of phage genomes that were excised as spacers (termed protospacers) revealed that they were not randomly selected but instead were found adjacent to short (3–5 bp) DNA sequences termed
protospacer adjacent motifs (PAM). Analysis of CRISPR-Cas systems showed PAMs to be important for type I and type II, but not type III systems during acquisition.
In type I and type II systems, protospacers are excised at positions adjacent to a PAM sequence, with the other end of the spacer cut using a ruler mechanism, thus maintaining the regularity of the spacer size in the CRISPR array.
The conservation of the PAM sequence differs between CRISPR-Cas systems and appears to be evolutionarily linked to Cas1 and the
leader sequence.
New spacers are added to a CRISPR array in a directional manner,
occurring preferentially,
but not exclusively, adjacent
to the leader sequence. Analysis of the type I-E system from ''E. coli'' demonstrated that the first direct repeat adjacent to the leader sequence is copied, with the newly acquired spacer inserted between the first and second direct repeats.
The PAM sequence appears to be important during spacer insertion in type I-E systems. That sequence contains a strongly conserved final nucleotide (nt) adjacent to the first nt of the protospacer. This nt becomes the final base in the first direct repeat.
This suggests that the spacer acquisition machinery generates single-stranded overhangs in the second-to-last position of the direct repeat and in the PAM during spacer insertion. However, not all CRISPR-Cas systems appear to share this mechanism as PAMs in other organisms do not show the same level of conservation in the final position.
It is likely that in those systems, a blunt end is generated at the very end of the direct repeat and the protospacer during acquisition.
Insertion variants
Analysis of ''
Sulfolobus solfataricus
''Saccharolobus solfataricus'' is a species of thermophilic archaeon. It was transferred from the genus ''Sulfolobus'' to the new genus ''Saccharolobus'' with the description of ''Saccharolobus caldissimus'' in 2018.
It was first discovered an ...
'' CRISPRs revealed further complexities to the canonical model of spacer insertion, as one of its six CRISPR loci inserted new spacers randomly throughout its CRISPR array, as opposed to inserting closest to the leader sequence.
Multiple CRISPRs contain many spacers to the same phage. The mechanism that causes this phenomenon was discovered in the type I-E system of ''E. coli''. A significant enhancement in spacer acquisition was detected where spacers already target the phage, even mismatches to the protospacer. This 'priming' requires the Cas proteins involved in both acquisition and interference to interact with each other. Newly acquired spacers that result from the priming mechanism are always found on the same strand as the priming spacer.
This observation led to the hypothesis that the acquisition machinery slides along the foreign DNA after priming to find a new protospacer.
Biogenesis
CRISPR-RNA (crRNA), which later guides the Cas nuclease to the target during the interference step, must be generated from the CRISPR sequence. The crRNA is initially transcribed as part of a single long transcript encompassing much of the CRISPR array.
This transcript is then cleaved by Cas proteins to form crRNAs. The mechanism to produce crRNAs differs among CRISPR-Cas systems. In type I-E and type I-F systems, the proteins Cas6e and Cas6f respectively, recognise stem-loops
created by the pairing of identical repeats that flank the crRNA.
These Cas proteins cleave the longer transcript at the edge of the paired region, leaving a single crRNA along with a small remnant of the paired repeat region.
Type III systems also use Cas6, however, their repeats do not produce stem-loops. Cleavage instead occurs by the longer transcript wrapping around the Cas6 to allow cleavage just upstream of the repeat sequence.
Type II systems lack the Cas6 gene and instead utilize RNaseIII for cleavage. Functional type II systems encode an extra small RNA that is complementary to the repeat sequence, known as a
trans-activating crRNA In molecular biology, trans-activating CRISPR RNA (tracrRNA) is a small ''trans''-encoded RNA. It was first discovered by Emmanuelle Charpentier in her study of the human pathogen ''Streptococcus pyogenes'', a type of bacteria that causes harm to h ...
(tracrRNA).
Transcription of the tracrRNA and the primary CRISPR transcript results in base pairing and the formation of dsRNA at the repeat sequence, which is subsequently targeted by RNaseIII to produce crRNAs. Unlike the other two systems, the crRNA does not contain the full spacer, which is instead truncated at one end.
CrRNAs associate with Cas proteins to form ribonucleotide complexes that recognize foreign nucleic acids. CrRNAs show no preference between the coding and non-coding strands, which is indicative of an RNA-guided DNA-targeting system.
The type I-E complex (commonly referred to as Cascade) requires five Cas proteins bound to a single crRNA.
Interference
During the interference stage in type I systems, the PAM sequence is recognized on the crRNA-complementary strand and is required along with crRNA annealing. In type I systems correct base pairing between the crRNA and the protospacer signals a conformational change in Cascade that recruits
Cas3
Cas3 is an ATP-dependent single-strand DNA (ssDNA) translocase/helicase enzyme that degrades DNA as part of CRISPR based immunity.
Cas3 is a "signature" protein of class 1 CRISPR systems and functions in a complex known as CASCADE, with other ...
for DNA degradation.
Type II systems rely on a single multifunctional protein,
Cas9
Cas9 (CRISPR associated protein 9, formerly called Cas5, Csn1, or Csx12) is a 160 dalton (unit), kilodalton protein which plays a vital role in the immunological defense of certain bacteria against DNA viruses and plasmids, and is heavily utili ...
, for the interference step.
Cas9 requires both the crRNA and the tracrRNA to function and cleave DNA using its dual HNH and RuvC/RNaseH-like endonuclease domains. Basepairing between the PAM and the phage genome is required in type II systems. However, the PAM is recognized on the same strand as the crRNA (the opposite strand to type I systems).
Type III systems, like type I require six or seven Cas proteins binding to crRNAs.
The type III systems analysed from ''S. solfataricus'' and ''P. furiosus'' both target the mRNA of phages rather than phage DNA genome,
which may make these systems uniquely capable of targeting RNA-based phage genomes.
Type III systems were also found to target DNA in addition to RNA using a different Cas protein in the complex, Cas10.
The DNA cleavage was shown to be transcription dependent.
The mechanism for distinguishing self from foreign DNA during interference is built into the crRNAs and is therefore likely common to all three systems. Throughout the distinctive maturation process of each major type, all crRNAs contain a spacer sequence and some portion of the repeat at one or both ends. It is the partial repeat sequence that prevents the CRISPR-Cas system from targeting the chromosome as base pairing beyond the spacer sequence signals self and prevents DNA cleavage.
RNA-guided CRISPR enzymes are classified as
type V restriction enzymes.
Evolution
The cas genes in the adaptor and effector modules of the CRISPR-Cas system are believed to have evolved from two different ancestral modules. A
transposon
A transposable element (TE), also transposon, or jumping gene, is a type of mobile genetic element, a nucleic acid sequence in DNA that can change its position within a genome.
The discovery of mobile genetic elements earned Barbara McClinto ...
-like element called
casposon encoding the Cas1-like integrase and potentially other components of the adaptation module was inserted next to the ancestral effector module, which likely functioned as an independent innate immune system. The highly conserved cas1 and cas2 genes of the adaptor module evolved from the ancestral module while a variety of class 1 effector cas genes evolved from the ancestral effector module. The evolution of these various class 1 effector module cas genes was guided by various mechanisms, such as duplication events. On the other hand, each type of class 2 effector module arose from subsequent independent insertions of mobile genetic elements.
These mobile genetic elements took the place of the multiple gene effector modules to create single gene effector modules that produce large proteins which perform all the necessary tasks of the effector module.
The spacer regions of CRISPR-Cas systems are taken directly from foreign mobile genetic elements and thus their long-term evolution is hard to trace. The non-random evolution of these spacer regions has been found to be highly dependent on the environment and the particular foreign mobile genetic elements it contains.
CRISPR-Cas can immunize bacteria against certain phages and thus halt transmission. For this reason,
Koonin described CRISPR-Cas as a
Lamarckian inheritance mechanism.
However, this was disputed by a critic who noted, "We should remember
amarckfor the good he contributed to science, not for things that resemble his theory only superficially. Indeed, thinking of CRISPR and other phenomena as Lamarckian only obscures the simple and elegant way evolution really works". But as more recent studies have been conducted, it has become apparent that the acquired spacer regions of CRISPR-Cas systems are indeed a form of Lamarckian evolution because they are genetic mutations that are acquired and then passed on.
On the other hand, the evolution of the Cas gene machinery that facilitates the system evolves through classic Darwinian evolution.
Coevolution
Analysis of CRISPR sequences revealed
coevolution
In biology, coevolution occurs when two or more species reciprocally affect each other's evolution through the process of natural selection. The term sometimes is used for two traits in the same species affecting each other's evolution, as well a ...
of host and viral genomes.
The basic model of CRISPR evolution is newly incorporated spacers driving phages to mutate their genomes to avoid the bacterial immune response, creating diversity in both the phage and host populations. To resist a phage infection, the sequence of the CRISPR spacer must correspond perfectly to the sequence of the target phage gene. Phages can continue to infect their hosts' given point mutations in the spacer.
Similar stringency is required in PAM or the bacterial strain remains phage sensitive.
Rates
A study of 124 ''S. thermophilus'' strains showed that 26% of all spacers were unique and that different CRISPR loci showed different rates of spacer acquisition.
Some CRISPR loci evolve more rapidly than others, which allowed the strains' phylogenetic relationships to be determined. A
comparative genomic analysis showed that ''E. coli'' and ''
S. enterica'' evolve much more slowly than ''S. thermophilus''. The latter's strains that diverged 250,000 years ago still contained the same spacer complement.
Metagenomic analysis of two acid-mine-drainage
biofilm
A biofilm is a Syntrophy, syntrophic Microbial consortium, community of microorganisms in which cell (biology), cells cell adhesion, stick to each other and often also to a surface. These adherent cells become embedded within a slimy ext ...
s showed that one of the analyzed CRISPRs contained extensive deletions and spacer additions versus the other biofilm, suggesting a higher phage activity/prevalence in one community than the other.
In the oral cavity, a temporal study determined that 7–22% of spacers were shared over 17 months within an individual while less than 2% were shared across individuals.
From the same environment, a single strain was tracked using
PCR primers specific to its CRISPR system. Broad-level results of spacer presence/absence showed significant diversity. However, this CRISPR added three spacers over 17 months,
suggesting that even in an environment with significant CRISPR diversity some loci evolve slowly.
CRISPRs were analysed from the metagenomes produced for the
Human Microbiome Project
The Human Microbiome Project (HMP) was a United States National Institutes of Health (NIH) research initiative to improve understanding of the microbiota involved in human health and disease. Launched in 2007, the first phase (HMP1) focused on i ...
.
Although most were body-site specific, some within a body site are widely shared among individuals. One of these loci originated from
streptococcal species and contained ≈15,000 spacers, 50% of which were unique. Similar to the targeted studies of the oral cavity, some showed little evolution over time.
CRISPR evolution was studied in
chemostat
A chemostat (from ''chem''ical environment is ''stat''ic) is a bioreactor to which fresh medium is continuously added, while culture liquid containing left over nutrients, metabolic end products and microorganisms is continuously removed at the s ...
s using ''S. thermophilus'' to directly examine spacer acquisition rates. In one week, ''S. thermophilus'' strains acquired up to three spacers when challenged with a single phage.
During the same interval, the phage developed
single-nucleotide polymorphism
In genetics and bioinformatics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a ...
s that became fixed in the population, suggesting that targeting had prevented phage replication absent these mutations.
Another ''S. thermophilus'' experiment showed that phages can infect and replicate in hosts that have only one targeting spacer. Yet another showed that sensitive hosts can exist in environments with high-phage titres.
The chemostat and observational studies suggest many nuances to CRISPR and phage (co)evolution.
Identification
CRISPRs are widely distributed among bacteria and archaea
and show some sequence similarities.
Their most notable characteristic is their repeating spacers and direct repeats. This characteristic makes CRISPRs easily identifiable in long sequences of DNA, since the number of repeats decreases the likelihood of a false positive match.
Analysis of CRISPRs in metagenomic data is more challenging, as CRISPR loci do not typically assemble, due to their repetitive nature or through strain variation, which confuses assembly algorithms. Where many reference genomes are available,
polymerase chain reaction
The polymerase chain reaction (PCR) is a method widely used to make millions to billions of copies of a specific DNA sample rapidly, allowing scientists to amplify a very small sample of DNA (or a part of it) sufficiently to enable detailed st ...
(PCR) can be used to amplify CRISPR arrays and analyse spacer content.
However, this approach yields information only for specifically targeted CRISPRs and for organisms with sufficient representation in public databases to design reliable polymerase PCR primers. Degenerate repeat-specific primers can be used to amplify CRISPR spacers directly from environmental samples; amplicons containing two or three spacers can be then computationally assembled to reconstruct long CRISPR arrays.
The alternative is to extract and reconstruct CRISPR arrays from shotgun metagenomic data. This is computationally more difficult, particularly with second generation sequencing technologies (e.g. 454, Illumina), as the short read lengths prevent more than two or three repeat units appearing in a single read. CRISPR identification in raw reads has been achieved using purely ''de novo'' identification
or by using direct repeat sequences in partially assembled CRISPR arrays from
contig
A contig (from ''contiguous'') is a set of overlapping DNA segments that together represent a consensus region of DNA.Gregory, S. ''Contig Assembly''. Encyclopedia of Life Sciences, 2005.
In bottom-up sequencing projects, a contig refers to over ...
s (overlapping DNA segments that together represent a consensus region of DNA)
and direct repeat sequences from published genomes
as a hook for identifying direct repeats in individual reads.
Use by phages
Another way for bacteria to defend against phage infection is by having
chromosomal islands. A subtype of chromosomal islands called phage-inducible chromosomal island (PICI) is excised from a bacterial chromosome upon phage infection and can inhibit phage replication.
PICIs are induced, excised, replicated, and finally packaged into small capsids by certain staphylococcal temperate phages. PICIs use several mechanisms to block phage reproduction. In the first mechanism, PICI-encoded Ppi differentially blocks phage maturation by binding or interacting specifically with phage TerS, hence blocking phage TerS/TerL complex formation responsible for phage DNA packaging. In the second mechanism PICI CpmAB redirects the phage capsid morphogenetic protein to make 95% of SaPI-sized capsid and phage DNA can package only 1/3rd of their genome in these small capsids and hence become nonviable phage. The third mechanism involves two proteins, PtiA and PtiB, that target the LtrC, which is responsible for the production of virion and lysis proteins. This interference mechanism is modulated by a modulatory protein, PtiM, binds to one of the interference-mediating proteins, PtiA, and hence achieves the required level of interference.
One study showed that lytic ICP1 phage, which specifically targets ''
Vibrio cholerae
''Vibrio cholerae'' is a species of Gram-negative bacteria, Gram-negative, Facultative anaerobic organism, facultative anaerobe and Vibrio, comma-shaped bacteria. The bacteria naturally live in Brackish water, brackish or saltwater where they att ...
''
serogroup O1, has acquired a CRISPR-Cas system that targets a ''V. cholera'' PICI-like element. The system has 2 CRISPR loci and 9 Cas genes. It seems to be
homologous to the I-F system found in ''
Yersinia pestis
''Yersinia pestis'' (''Y. pestis''; formerly ''Pasteurella pestis'') is a Gram-negative bacteria, gram-negative, non-motile bacteria, non-motile, coccobacillus Bacteria, bacterium without Endospore, spores. It is related to pathogens ''Yer ...
''. Moreover, like the bacterial CRISPR-Cas system, ICP1 CRISPR-Cas can acquire new sequences, which allows phage and host to co-evolve.
Certain archaeal viruses were shown to carry mini-CRISPR arrays containing one or two spacers. It has been shown that spacers within the virus-borne CRISPR arrays target other viruses and plasmids, suggesting that mini-CRISPR arrays represent a mechanism of heterotypic superinfection exclusion and participate in interviral conflicts.
Applications
CRISPR gene editing is a revolutionary technology that allows for precise, targeted modifications to the DNA of living organisms. Developed from a natural defense mechanism found in bacteria, CRISPR-Cas9 is the most commonly used system. Gene editing with CRISPR-Cas9 involves a
Cas9 nuclease and an engineered
guide RNA, which come together to allow for the precise "cutting" of one or both strands of DNA at specific locations within the genome.
It makes use of the cell's natural DNA repair systems, including
non-homologous end joining
Non-homologous end joining (NHEJ) is a pathway that repairs double-strand breaks in DNA. It is called "non-homologous" because the break ends are directly ligated without the need for a homologous template, in contrast to homology directed repair ...
,
homology-directed repair
Homology-directed repair (HDR) is a mechanism in cells to repair double-strand DNA lesions. The most common form of HDR is homologous recombination. The HDR mechanism can only be used by the cell when there is a homologous piece of DNA presen ...
, or
mismatch repair
DNA mismatch repair (MMR) is a system for recognizing and repairing erroneous insertion, deletion, and mis-incorporation of nucleobase, bases that can arise during DNA replication and Genetic recombination, recombination, as well as DNA repair, ...
, to modify, insert, or delete genetic material at these specific cut sites.
This technology has transformed fields such as genetics, medicine,
and agriculture,
offering potential treatments for genetic disorders, advancements in crop engineering, and research into the fundamental workings of life. However, its ethical implications and potential unintended consequences have sparked significant debate.
See also
*
CRISPR activation
*
Anti-CRISPR
*
CRISPR/Cas Tools
*
CRISPR gene editing
CRISPR gene editing (; pronounced like "crisper"; an abbreviation for "clustered regularly interspaced short palindromic repeats") is a genetic engineering technique in molecular biology by which the genomes of living organisms may be modified. ...
*
The CRISPR Journal
* "
Designer baby
A designer baby is an embryo or fetus whose genetic makeup has been intentionally selected or altered, often to exclude a particular gene or to remove genes associated with disease, to achieve desired traits. This process usually involves pre ...
"
*
DRACO
*
Gene knockout
Gene knockouts (also known as gene deletion or gene inactivation) are a widely used genetic engineering technique that involves the gene targeting, targeted removal or inactivation of a specific gene within an organism's genome. This can be done t ...
*
Genome-wide CRISPR-Cas9 knockout screens
Genome-wide CRISPR-Cas9 knockout screens aim to elucidate the relationship between genotype and phenotype by ablating gene expression on a genome-wide scale and studying the resulting phenotypic alterations. The approach utilises the CRISPR gene ed ...
*
Glossary of genetics
*
Human germline engineering
*
''Human Nature'' (2019 documentary film)
*
MAGESTIC
*
''New'' eugenics
*
Prime editing
*
RNAi
RNA interference (RNAi) is a biological process in which RNA molecules are involved in sequence-specific suppression of gene expression by double-stranded RNA, through translational or transcriptional repression. Historically, RNAi was known b ...
*
SiRNA
Small interfering RNA (siRNA), sometimes known as short interfering RNA or silencing RNA, is a class of double-stranded non-coding RNA molecules, typically 20–24 base pairs in length, similar to microRNA (miRNA), and operating within the RN ...
*
Surveyor nuclease assay
*
Synthetic biology
Synthetic biology (SynBio) is a multidisciplinary field of science that focuses on living systems and organisms. It applies engineering principles to develop new biological parts, devices, and systems or to redesign existing systems found in nat ...
*
Zinc finger
A zinc finger is a small protein structural motif that is characterized by the coordination of one or more zinc ions (Zn2+) which stabilizes the fold. The term ''zinc finger'' was originally coined to describe the finger-like appearance of a ...
Notes
References
Further reading
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
External links
*
*
*
Protein Data Bank
*
*
*
*
*
{{DEFAULTSORT:Crispr
1987 in biotechnology
2015 in biotechnology
Biological engineering
Biotechnology
Genetic engineering
Genome editing
Jennifer Doudna
Molecular biology
Non-coding RNA
Repetitive DNA sequences
Immune system
Prokaryote genes