Epitranscriptomic Sequencing
   HOME

TheInfoList



OR:

In epitranscriptomic sequencing, most methods focus on either (1) enrichment and purification of the modified RNA molecules before running on the
RNA sequencer Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in Genetic code, coding, Translation (biology), decoding, Regulatory RNA, regulation and Gene expression, expression of genes. RNA and deoxyribonucleic acid ( ...
, or (2) improving or modifying
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combin ...
analysis pipelines to call the modification peaks. Most methods have been adapted and optimized for
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the ...
molecules, except for modified
bisulfite sequencing Bisulfite sequencing (also known as bisulphite sequencing) is the use of bisulfite treatment of DNA before routine sequencing to determine the pattern of methylation. DNA methylation was the first discovered epigenetic mark, and remains the mo ...
for profiling
5-methylcytidine 5-Methylcytidine is a modified nucleoside derived from 5-methylcytosine. It is found in ribonucleic acid Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression ...
which was optimized for
tRNAs Transfer RNA (abbreviated tRNA and formerly referred to as sRNA, for soluble RNA) is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes), that serves as the physical link between the mRNA and the amino a ...
and
rRNA Ribosomal ribonucleic acid (rRNA) is a type of non-coding RNA which is the primary component of ribosomes, essential to all cells. rRNA is a ribozyme which carries out protein synthesis in ribosomes. Ribosomal RNA is transcribed from riboso ...
s. There are seven major classes of chemical modifications found in RNA molecules: N6-methyladenosine,
2'-O-methylation 2'-''O''-methylation is a common nucleoside modification of RNA, where a methyl group is added to the 2' hydroxyl of the ribose moiety of a nucleoside, producing a methoxy group. 2'-''O''-methylated nucleosides are mostly found in ribosomal RNA a ...
, N6,2'-O-dimethyladenosine,
5-methylcytidine 5-Methylcytidine is a modified nucleoside derived from 5-methylcytosine. It is found in ribonucleic acid Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression ...
, 5-hydroxylmethylcytidine,
inosine Inosine is a nucleoside that is formed when hypoxanthine is attached to a ribose ring (also known as a ribofuranose) via a β-N9-glycosidic bond. It was discovered in 1965 in analysis of RNA transferase. Inosine is commonly found in tRNAs and is ...
, and
pseudouridine Pseudouridine (abbreviated by the Greek letter psi- Ψ) is an isomer of the nucleoside uridine in which the uracil is attached via a carbon-carbon instead of a nitrogen-carbon glycosidic bond. (In this configuration, uracil is sometimes referred ...
.Li, X, Xiong, X., and Yi, C. (2017)
Epitranscriptomic Sequencing Technologies: decoding RNA modifications
Nature Methods. doi:10.1038/NMETH_4110
Various sequencing methods have been developed to profile each type of modification. The scale, resolution, sensitivity, and limitations associated with each method and the corresponding bioinformatics tools used will be discussed.


Methods for profiling N6-methyladenosine

Methylation of
adenosine Adenosine (symbol A) is an organic compound that occurs widely in nature in the form of diverse derivatives. The molecule consists of an adenine attached to a ribose via a β-N9-glycosidic bond. Adenosine is one of the four nucleoside buildin ...
does not affect its ability to base-pair with
thymidine Thymidine (symbol dT or dThd), also known as deoxythymidine, deoxyribosylthymine, or thymine deoxyriboside, is a pyrimidine deoxynucleoside. Deoxythymidine is the DNA nucleoside T, which pairs with deoxyadenosine (A) in double-stranded DN ...
or
uracil Uracil () (symbol U or Ura) is one of the four nucleobases in the nucleic acid RNA. The others are adenine (A), cytosine (C), and guanine (G). In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced ...
, so N6-methyladenosine (m6A) cannot be detected using standard sequencing or
hybridization Hybridization (or hybridisation) may refer to: *Hybridization (biology), the process of combining different varieties of organisms to create a hybrid *Orbital hybridization, in chemistry, the mixing of atomic orbitals into new hybrid orbitals *Nu ...
methods. This modification is marked by the methylation of the adenosine base at the nitrogen-6 position. It is abundantly found in
polyA Polyadenylation is the addition of a poly(A) tail to an RNA transcript, typically a messenger RNA (mRNA). The poly(A) tail consists of multiple adenosine monophosphates; in other words, it is a stretch of RNA that has only adenine bases. In euk ...
+ mRNA; also found in tRNA, rRNA,
snRNA Small nuclear RNA (snRNA) is a class of small RNA molecules that are found within the splicing speckles and Cajal bodies of the cell nucleus in eukaryotic cells. The length of an average snRNA is approximately 150 nucleotides. They are transcr ...
, and long
ncRNA A non-coding RNA (ncRNA) is a functional RNA molecule that is not translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally important types of non- ...
.


m6A-seq and MeRIP-seq

In 2012, the first two methods for m6A sequencing came out that enabled
transcriptome The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The ...
-wide profile of m6A in mammalian cells. These two techniques, called m6A-seq and MeRIP-seq (m6A-specific methylated
RNA immunoprecipitation Immunoprecipitation (IP) is the technique of precipitating a protein antigen out of solution using an antibody that specifically binds to that particular protein. This process can be used to isolate and concentrate a particular protein from a s ...
), are also the first methods to allow for any type of RNA modification sequencing. These methods were able to detect 10,000 m6A peaks in the mammalian transcriptome; the peaks were found to be enriched in 3’UTR regions, near STOP codons, and within long exons. The two methods were optimized to detect methylation peaks in poly(A)+ mRNA, but the protocol could be adapted to profile any type of RNA. Collected RNA sample is fragmented into ~100-nucleotide-long oligonucleotides using a fragmentation buffer, immunoprecipitation with purified anti-m6A antibody, elution and collection of antibody-tagged RNA molecules. The immunoprecipitation procedure in MeRIP-Seq is able to produce >130fold enrichment of m6A sequences. Random primed cDNA library generation was performed, followed by adaptor ligation and Illumina sequencing. Since the RNA strands are randomly chopped up, the m6A site should, in principle, lie somewhere in the center of the regions to which sequence reads align. At extremes, the region would be roughly 200nt wide (100nt up- and downstream of the m6A site). When the first nucleotide of a transcript is an adenosine, in addition to the ribose 2’-O-methylation, this base can be further methylated at the N6 position. m6A-seq was confirmed to be able to detect m6Am peaks at transcription start sites.Schwartz, S. et al. (2014)
Perturbation of m6A Writers Reveals Two Distinct Classes of mRNA Methylation at Internal and 50 Sites
Cell Reports. 8: 284-296.
Adapter ligation at both ends of RNA fragment results in reads tending to pileup at the 5’ terminus of the transcript. Schwartz et al. (2015) leveraged this knowledge to detect mTSS sites by picking out sites with a high ratio of the size of pileups in the IP samples compared to input sample. As confirmation, >80% of the highly enriched pileup sites contained adenosine. The resolution of these methods is 100-200nt, which was the range of the fragment size. These two methods had several drawbacks: (1) required substantial input material, (2) low resolution which made pinpointing the actual site with the m6A mark difficult, and (3) cannot directly assess false positives. Especially in MeRIP-Seq, the bioinformatics tools that are currently available are only able to call 1 site per ~100-200nt wide peak, so a substantial portion of clustered m6As (~64nt between each individual site within a cluster) are missed. Each cluster can contain up to 15 m6A residues. In 2013, a modified version of m6A-seq based on the previous two methods m6A-seq and MeRIP-seq came out which aimed to increase resolution, and demonstrated this in the yeast transcriptome. They achieved this by decreasing fragment size and employing a ligation-based strand-specific library preparation protocol capturing both ends of the fragmented RNA, ensuring that the methylated position is within the sequenced fragment. By additionally referencing the m6A consensus motif and eliminating false positive m6A peaks using negative control samples, the m6A profiling in yeast was able to be done at single-base resolution.


UV-based Methods


PA-m6A-seq

UV-induced RNA-antibody crosslinking was added on top of m6A-seq to produce PA-m6A-seq (photo-crosslinking-assisted m6A-seq) which increases resolution up to ~23nt. First, 4-thiourodine (4SU) is incorporated into the RNA by adding 4SU in
growth media A growth medium or culture medium is a solid, liquid, or semi-solid designed to support the growth of a population of microorganisms or cells via the process of cell proliferation or small plants like the moss '' Physcomitrella patens''. Diff ...
, some incorporation sites presumably near m6A location. Immunoprecipitation is then performed on full-length RNA using m6A-specific antibody 6 UV light at 365 nm is then shined onto RNA to activate the crosslinking to the antibody with 4SU. Crosslinked RNA was isolated via
competition elution Competition is a rivalry where two or more parties strive for a common goal which cannot be shared: where one's gain is the other's loss (an example of which is a zero-sum game). Competition can arise between entities such as organisms, indivi ...
and fragmented further to ~25-30nt;
proteinase K In molecular biology Proteinase K (, ''protease K'', ''endopeptidase K'', ''Tritirachium alkaline proteinase'', ''Tritirachium album serine proteinase'', ''Tritirachium album proteinase K'') is a broad-spectrum serine protease. The enzyme was dis ...
was used to dissociate the covalent bond between crosslinking site and antibody. Peptide fragments that remain after antibody removal from RNA cause the base to be read as a C as opposed to a T during reverse transcription, effectively inducing a point mutation at the 4SU crosslinking site. The short fragments are subjected to library construction and
Illumina sequencing Illumina dye sequencing is a technique used to determine the series of base pairs in DNA, also known as DNA sequencing. The reversible terminated chemistry concept was invented by Bruno Canard and Simon Sarfati at the Pasteur Institute in Paris. I ...
, followed by finding the consensus methylation sequence. The presence of the T to C mutation helps increase the signal to noise ratio of methylation site detection as well as providing greater resolution to the methylation sequence. One shortcoming of this method is that m6A sites that did not incorporate 4SU can't be detected. Another caveat is that position of 4SU incorporation can vary relative to any single m6A residue, so it still remains challenging to precisely locate m6A site using the T to C mutation.


m6A-CLIP and miCLIP

m6A-CLIP (crosslinking immunoprecipitation) and miCLIP (m6A individual-nucleotide-resolution crosslinking and immunoprecipitation) are UV-based sequencing techniques. These two methods activate crosslinking at 254 nm, fragments RNA molecules before immunoprecipitation with antibody, and do not depend on the incorporation of photoactivatable ribonucleosides - the antibody directly crosslinks with a base close (very predictable location) to the m6A site. These UV-based strategies uses antibodies that induces consistent and predictable mutational and truncation patterns in the cDNA strand during reverse-transcription that could be leveraged to more precisely locate the m6A site. Though both m6A-CLIP and miCLIP reply on UV induced mutations, m6A-CLIP is distinct by taking advantage that m6A alone can induce cDNA truncation during reverse transcription and generate single-nucleotide mapping for over ten folds more precise m6A sites (MITS, m6A-induced truncation sites), permitting comprehensive and unbiased precise m6A mapping. In contrast, UV-mapped m6A sites by miCLIP is only a small subset of total precise m6A sites. The precise location of tens of thousands of m6A sites in human and mouse mRNAs by m6A-CLIP reveals that m6A is enriched at last exon but not around stop codon. In m6A-CLIP and miCLIP, RNA is fragmented to ~20-80nt first, then the 254 nm UV-induced covalent RNA/m6A antibody complex was formed in the fragments containing m6A. The antibody was removed with proteinase K before reverse-transcription, library construction and sequencing. Remnants of
peptide Peptides (, ) are short chains of amino acids linked by peptide bonds. Long chains of amino acids are called proteins. Chains of fewer than twenty amino acids are called oligopeptides, and include dipeptides, tripeptides, and tetrapeptides. ...
s at the crosslinking site on the RNA after antibody removal, leads to insertions, truncations, and C to T mutations during
reverse transcription A reverse transcriptase (RT) is an enzyme used to generate complementary DNA (cDNA) from an RNA template, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B to replicate their genomes ...
to cDNA, especially at the +1 position to the m6A site (5’ to the m6A site) in the sequence reads. Positive sites seen using m6A-CLIP and miCLIP had high percent of matches with those detected using
SCARLET Scarlet may refer to: * Scarlet (cloth), a type of woollen cloth common in medieval England * Scarlet (color), a bright tone of red that is slightly toward orange, named after the cloth * Scarlet (dye), the dye used to give the cloth its color * ...
, which has higher local resolution around a specific site, (see below), implicating m6A-CLIP and miCLIP has high spatial resolution and low false discovery rate. miCLIP has been used to detect m6Am by looking at crosslinking-induced truncation sites at the 5’UTR.


Methods for quantifying m6A modification status

Although m6A sites could be profiled at high resolution using UV-based methods, the stoichiometry of m6A sites - the methylation status or the ratio m6A+ to m6A- for each individual site within a type of RNA - is still unknown. SCARLET (2013) and m6A-LAIC-seq (2016) allows for the quantitation of stoichiometry at a specific locus and transcriptome-wide, respectively. Bioinformatics methods used to analyze m6A peaks do not make any prior assumptions about the
sequence motif In biology, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and usually assumed to be related to biological function of the macromolecule. For example, an ''N''-glycosylation site motif can be defined as '' ...
s within which m6A sites are usually found, and take into consideration all possible motifs. Therefore, it is less likely to miss sites.


SCARLET

SCARLET (site-specific cleavage and radioactive-labeling followed by ligation-assisted extraction and thin-layer chromatography) is used determining the fraction of RNA in a sample that carries a methylated adenine at a specific site. One can start with total RNA without having to enrich for the target RNA molecule. Therefore, it is an especially suitable method for quantifying methylation status in low abundance RNAs such as tRNAs. However, it is not suitable or practical for large-scale location of m6A sites. The procedure begins with a
chimeric DNA Recombinant DNA (rDNA) molecules are DNA molecules formed by laboratory methods of genetic recombination (such as molecular cloning) that bring together genetic material from multiple sources, creating sequences that would not otherwise be foun ...
oligonucleotide Oligonucleotides are short DNA or RNA molecules, oligomers, that have a wide range of applications in genetic testing, research, and forensics. Commonly made in the laboratory by solid-phase chemical synthesis, these small bits of nucleic acids ...
annealing to the target RNA around the candidate modification site. The chimeric ssDNA has 2’OMe/2’H modifications and is complementary to the target sequence. The chimeric oligonucleotide serves as a guide to allow
RNase H Ribonuclease H (abbreviated RNase H or RNH) is a family of non-sequence-specific endonuclease enzymes that catalyze the cleavage of RNA in an RNA/ DNA substrate via a hydrolytic mechanism. Members of the RNase H family can be found in nearly ...
to cleave the RNA strand precisely at the 5’-end of the candidate site. The cut site is then radiolabeled with phosphorus-32 and splint-ligated to a 116nt ssDNA oligonucleotide using
DNA ligase DNA ligase is a specific type of enzyme, a ligase, () that facilitates the joining of DNA strands together by catalyzing the formation of a phosphodiester bond. It plays a role in repairing single-strand breaks in duplex DNA in living org ...
. RNase T1/A is introduced to the sample to digest all RNA, except for the RNA molecules with the 116-mers DNA attached. This radiolabeled product is then isolated and digested by nuclease to generate a mixture of modified and unmodified adenosines (5’P-m6A and 5’-P-A) which is separated using thin layer chromatography. The relative proportions of the two groups can be determined using UV absorption levels.


m6A-LAIC-seq

m6A-LAIC-seq (m6A-level and isoform-characterization sequencing) is a high-throughput approach to quantify methylation status on a whole-transcriptome scale. Full-length RNA samples are used in this method. RNAs are first subjected to immunoprecipitation with an anti-m6A antibody. Excess antibody is added to the mixture to ensure all m6A-containing RNAs are pulled down. The mixture is separated into eluate (m6A+ RNAs) and supernatant (m6A- RNAs) pools. External RNA Controls Consortium (ERCC) spike ins are added to the eluate and supernatant, as well as an independent control arm consisting of just ERCC spike in. After antibody cleavage in the eluate pool, each of the three mixtures are sequenced on a next generation sequencing platform. The m6A levels per site or gene could be quantified by the ERCC-normalized RNA abundances in different pools. Since full-length RNA is used, it is possible to directly compare alternatively spliced isoforms between the m6A+ and m6A- fractions as well as comparing isoform abundance within the m6A+ portion. Despite the advances in m6A-sequencing, several challenges still remain: (1) A method has yet to be developed that characterizes the stoichiometry between different sites in the same transcript; (2) Analysis results are heavily dependent on the bioinformatics algorithm used to call the peaks; (3) Current methods all use m6A-specific antibodies to tag m6A sites, but it has been reported that the antibodies contain intrinsic bias for RNA sequences.


Methods for 2'-O-methylation Profiling

The 2'-O-methylation of the ribose moiety is one of the most common RNA modifications and is present in diverse highly abundant non-coding RNAs (ncRNAs) and at the 5' cap of mRNAs. Moreover, many studies have revealed that Nm at 3’-end is presented in some ncRNAs, such as microRNAs (miRNAs) in plants as well as PIWI-interacting RNAs (piRNAs) in animals.This modification can perturb the function of ribosomes and disrupt tRNA decoding, regulate alternative splicing fidelity, protect ncRNAs from 3’-5’ exonucleolytic degradation and provide a molecular signature for discrimination of self from non-self mRNA.


Nm-REP-seq

A novel method, Nm-REP-seq, was developed for the transcriptome-wide identification of 2'-O-methylation sites at single-base resolution by using RNA exoribonuclease (Mycoplasma genitalium RNase R, MgR) and periodate oxidation reactivity to eliminate 2'-hydroxylated (2'-OH) nucleosides. Nm-REP-seq discovered telomerase RNA component (TERC) RNA, scaRNAs and snoRNAs as new classes of Nm-containing ncRNAs as well as identified many 2'-O-methylation sites in various ncRNAs and mRNAs. Furthermore, Nm-REP-seq revealed 2'-O-Methylation located at the 3’-end of snoRNAs, snRNAs, tRNAs and fragments derived from them, as well as piRNAs and miRNAs.


Methods for N6,2'-O-dimethyladenosine (m6Am) Profiling

N6,2'-O-dimethyladenosine, abundant in polyA+ mRNAs, occurs at the first nucleotide after the
5' cap In molecular biology, the five-prime cap (5′ cap) is a specially altered nucleotide on the 5′ end of some primary transcripts such as precursor messenger RNA. This process, known as mRNA capping, is highly regulated and vital in the creation o ...
, when an additional methyl group is added to a 2ʹ-O-methyladenosine residue at the ‘capped’ 5ʹ end of mRNA. Since m6Am can be recognized by anti-m6A antibodies at
transcription start site Transcription is the process of copying a segment of DNA into RNA. The segments of DNA transcribed into RNA molecules that can encode proteins are said to produce messenger RNA (mRNA). Other segments of DNA are copied into RNA molecules calle ...
s, the methods used for m6A profiling can be and were adapted for m6Am profiling, namely m6A-seq, and miCLIP (see m6A-seq and miCLIP descriptions above).


Methods for 5-methylcytidine profiling

5-methylcytidine 5-Methylcytidine is a modified nucleoside derived from 5-methylcytosine. It is found in ribonucleic acid Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression ...
, m5C, is abundantly found in mRNA and ncRNAs, especially tRNA and rRNAs. In tRNAs, this modification stabilizes the
secondary structure Protein secondary structure is the three dimensional form of ''local segments'' of proteins. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary struct ...
and influences anticodon stem-loop conformation. In rRNAs, m5C affects translational fidelity. Two principles have been used to develop m5C sequencing methods. The first one is antibody-based approach (bisuphite sequencing and m5C-RIP), similar to m6C sequencing. The second is detecting targets of m5C RNA methyltransferases by covalently linking the enzyme to its target, and then using IP specific to the target enzyme to enrich for RNA molecules containing the mark (Aza-IP and miCLIP).


Modified bisulfite sequencing

Modified bisulfite sequencing was optimized for rRNA, tRNA, and miRNA molecules from
Drosophila ''Drosophila'' () is a genus of flies, belonging to the family Drosophilidae, whose members are often called "small fruit flies" or (less frequently) pomace flies, vinegar flies, or wine flies, a reference to the characteristic of many s ...
. Bisulfite treatment has been most widely used to detect dm5C ( DNA m5C). The treatment essentially converts a
cytosine Cytosine () (symbol C or Cyt) is one of the four nucleobases found in DNA and RNA, along with adenine, guanine, and thymine ( uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached ...
to a
uridine Uridine (symbol U or Urd) is a glycosylated pyrimidine analog containing uracil attached to a ribose ring (or more specifically, a ribofuranose) via a β-N1-glycosidic bond. The analog is one of the five standard nucleosides which make up nucl ...
, but methylated cytosines would be unchanged by the treatment. Previous attempts to develop m5C sequencing protocols using bisulfite treatment were not able to effectively address the problem of the harsh treatment of RNA which causes significant degradation of the molecules. Specifically, bisulfite deamination treatment (high pH) of RNA is detrimental to the stability of
phosphodiester bond In chemistry, a phosphodiester bond occurs when exactly two of the hydroxyl groups () in phosphoric acid react with hydroxyl groups on other molecules to form two ester bonds. The "bond" involves this linkage . Discussion of phosphodiesters is ...
s. As a result, it is difficult to pre-enrich RNA molecules or to obtain enough PCR product of the correct size for
deep sequencing Coverage (or depth) in DNA sequencing is the number of unique reads that include a given nucleotide in the reconstructed sequence. Deep sequencing refers to the general concept of aiming for high number of unique reads of each region of a sequence ...
. A modified version of bisulfite sequencing was developed by Schaefer et al. (2009) which decreased the temperature at which bisulfite treatment of RNA from 95 °C to 60 °C. The rationale behind the modification was that since RNA, unlike DNA, is not double-stranded, but rather, consists of regions of single-strandedness, double-stranded stem structures and loops, it could be possible to unwind RNA at a much lower temperature. Indeed, RNA could be treated for 180 minutes at 60C without significant loss of PCR amplicons of the expected size. Deamination rates were determined to be 99% at 180min of treatment. After bisulfite treatment of fragmented RNA, reverse transcription is performed, followed by
PCR amplification The polymerase chain reaction (PCR) is a method widely used to rapidly make millions to billions of copies (complete or partial) of a specific DNA sample, allowing scientists to take a very small sample of DNA and amplify it (or a part of it) ...
of the cDNA products, and finally deep sequencing was done using the
Roche 454 F. Hoffmann-La Roche AG, commonly known as Roche, is a Swiss multinational healthcare company that operates worldwide under two divisions: Pharmaceuticals and Diagnostics. Its holding company, Roche Holding AG, has shares listed on the SIX S ...
platform. Since the developers of the method used the Roche platform, they also used GS Amplicon Variant Analyzer (Roche) for analyzing deep sequencing data to quantify sequence-specific cytosine content. However, recent papers have suggested that the method have several flaws: (1) Incomplete conversion of regular cytosines in double-stranded regions of RNA; (2) areas containing other modifications that resulted in bisulfite-treatment resistance; and (3) sites containing potential false-positives due to (1) and (2) In addition, it is possible the sequencing depth is still not high enough to correctly detect all methylated sites.


Aza-IP

Aza-IP 5-azacytidine-mediated RNA immunoprecipitation has been optimized on and used for detecting targets of
methyltransferases Methyltransferases are a large group of enzymes that all methylate their substrates but can be split into several subclasses based on their structural features. The most common class of methyltransferases is class I, all of which contain a Ros ...
, particularly
NSUN2 NOP2/Sun domain family, member 2 is a protein that in humans is encoded by the ''NSUN2'' gene. Alternatively spliced transcript variants encoding different isoforms have been noted for the gene. Function The protein is a methyltransferase that ...
and
DNMT2 TRNA (cytosine38-C5)-methyltransferase (, ''hDNMT2 (gene)'', ''DNMT2 (gene)'', ''TRDMT1 (gene)'') is an enzyme with the systematic name ''S-adenosyl-L-methionine:tRNA (cytosine38-C5)-methyltransferase''. This enzyme catalyses the following chemi ...
Khoddami, V. and Cairns, B.R. (2013)
Identification of direct targets and modified bases of RNA cytosine methyltransferases
Nature Biotechnology. 31(5): 459-464.
— the two main enzymes responsible for laying down the m5C mark. First, the cell is made to overexpress an
epitope An epitope, also known as antigenic determinant, is the part of an antigen that is recognized by the immune system, specifically by antibodies, B cells, or T cells. The epitope is the specific piece of the antigen to which an antibody binds. The ...
-tagged m5C-RNA methytransferase derivative so that the antibody used later on for immunoprecipitation could recognize the enzyme. Second, 5-aza-C is introduced to the cells so that it could be incorporated into nascent RNA in place of cytosine. Normally, the methyltransferases are released (i.e. covalent bond between cytosine and methyltransferase is broken) following methylation of the residue. For 5-aza-C, due to a nitrogen substitution in the C5 position of cytosine, the RNA methytransferase enzyme remains covalently bound to the target RNA molecule at the C6 position. Third, the cell is
lysed Lysis ( ) is the breaking down of the membrane of a cell, often by viral, enzymic, or osmotic (that is, "lytic" ) mechanisms that compromise its integrity. A fluid containing the contents of lysed cells is called a ''lysate''. In molecular bio ...
and the m5C-RNA methyltransferase of interest is immunoprecipitated along with the RNA molecules that are covalently linked to the protein. The IP step enabled >200-fold enrichment of RNA targets, which were mainly tRNAs. The enriched molecules were then fragmented and purified. cDNA library is then constructed and sequencing is performed. An important additional feature is that RNA methyltransferase covalent linkage to the C5 of m-aza-C induces rearrangement and
ring opening A cyclic compound (or ring compound) is a term for a compound in the field of chemistry in which one or more series of atoms in the compound is connected to form a ring. Rings may vary in size from three to many atoms, and include examples where a ...
. This ring opening results in preferential pairing with cytosine and is therefore read as guanosine during sequencing. This C to G transversion allows for base resolution detection of m5C sites. One caveat is that m5C sites not replaced by 5-azacytosine will be missed.


miCLIP

miCLIP ( Methylation induced crosslinking immunoprecipitation) was used to detect
NSUN2 NOP2/Sun domain family, member 2 is a protein that in humans is encoded by the ''NSUN2'' gene. Alternatively spliced transcript variants encoding different isoforms have been noted for the gene. Function The protein is a methyltransferase that ...
targets, which were found to be mostly
non-coding RNA A non-coding RNA (ncRNA) is a functional RNA molecule that is not translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally important types of non- ...
s such as tRNA. An induced mutation of C271A in NSUN2 inhibits release of enzyme from RNA target. This mutation was over-expressed in the cells of interest, and the mutated NSUN2 was also tagged with the Myc epitope. The covalently linked RNA-protein complexes are isolated via immunoprecipitation for a Myc-specific antibody. These complexes are confirmed and detected by
radiolabeling A radioactive tracer, radiotracer, or radioactive label is a chemical compound in which one or more atoms have been replaced by a radionuclide so by virtue of its radioactive decay it can be used to explore the mechanism of chemical reactions by tr ...
with
phosphorus-32 Phosphorus-32 (32P) is a radioactive isotope of phosphorus. The nucleus of phosphorus-32 contains 15 protons and 17 neutrons, one more neutron than the most common isotope of phosphorus, phosphorus-31. Phosphorus-32 only exists in small quantit ...
. The RNA is then extracted from the complex, reverse-transcribed, amplified with PCR, and sequenced using next-generation platforms. Both miCLIP and Aza-IP, though limited by specific targeting of enzymes, can allow for the detection of low-abundance methylated RNA without deep sequencing.


Methods for Inosine Profiling

Inosine Inosine is a nucleoside that is formed when hypoxanthine is attached to a ribose ring (also known as a ribofuranose) via a β-N9-glycosidic bond. It was discovered in 1965 in analysis of RNA transferase. Inosine is commonly found in tRNAs and is ...
is created enzymatically when an adenosine residue is modified.


Analysis of base-pairing properties

Since the chemical makeup of inosine is a deaminated adenosine, this is one of few methylation alterations that has an accompanying alteration in base pairing, which can be capitalised on. The original adenosine nucleotide will pair with a thymine, whereas the methylated inosine will pair with a cytosine. cDNA sequences obtained by
rtPCR Reverse transcription polymerase chain reaction (RT-PCR) is a laboratory technique combining reverse transcription of RNA into DNA (in this context called complementary DNA or cDNA) and amplification of specific DNA targets using polymerase chai ...
can therefore be compared to the corresponding genomic sequences; in sites where A residues are repeatedly interpreted as G, a methylation event can be assumed. At high enough accuracy, it is feasible that the quantity of mRNA molecules in the population that have been methylated can be calculated as a percentage. This method potentially has single-nucleotide resolution. In fact, the abundance of RNA-seq data that is now publicly available can be leveraged to investigate G (in cDNA) versus A (in genome). One particular pipeline, called RNA and DNA differences (RDD), claims to excludes false positives, but only 56.8% of its A-to-I sites were found to be valid by ICE-seq (see below).


Limitations

The background noise caused by single nucleotide polymorphisms (SNPs), somatic mutations, pseudogenes and sequencing errors reduce the reliability of the signal, especially in a single-cell context.


Chemical methods


Inosine-specific cleavage

The first method to detect A-to-I RNA modifications, developed in 1997, was inosine-specific cleavage. RNA samples are treated with glyoxal and borate to specifically modify all G bases, and subsequently enzymatically digested to by RNase T1, which cleaves after I sites. The amplification of these fragments then allows analysis of cleavage sites and inference of A-to-I modification. . It was used to prove the position of inosine at specific sites rather than identify novel sites or transcriptome-wide profiles.


= Limitations

= The existence of two A-to-I modifications in relatively close proximity, which is common in Alu elements, means the downstream mod is less likely to be detected since the cDNA synthesis will be truncated at a prior nucleotide. The throughput is low, and the initial method required specific primers; the protocol is complicated and labour-intensive.


ICE and ICE-seq

Inosine chemical erasing (ICE) refer to a process in which
acrylonitrile Acrylonitrile is an organic compound with the formula and the structure . It is a colorless, volatile liquid although commercial samples can be yellow due to impurities. It has a pungent odor of garlic or onions. In terms of its molecular ...
is reacted with inosine to form N1-cyanoethylinosine (ce1I). This serves to stall reverse transcriptase and lead to truncated cDNA molecules. This was combined with deep-sequencing in a developed method called ICE-seq. Computational methods for automated analysis of the data are available, the main premise being the comparison of treated and untreated samples to identify truncated transcripts and thus infer an inosine modification by read count, with a step to reduce false positives by comparison to online database dbSNP.


= Limitations

= The original ICE protocol involved an RT-PCR amplification step and therefore required primers and knowledge of the location or regions to be investigated, alongside a maximum cDNA length of 300–500bp. The ICE-seq method is complicated, along with being labour-, reagent- and time-intensive. One protocol from 2015 took 22 days. This shares a limitation with inosine-specific cleavage, in that if there are two A-to-I modifications in relatively close proximity, the downstream mod is less likely to be detected since the cDNA synthesis will be truncated at a prior nucleotide. Both ICE and ICE-seq suffer from a lack of sensitivity to infrequently edited locations: it becomes difficult to distinguish a modification with a frequency of <10% from a false positive. An increase in read depth and quality can increase sensitivity, but also then suffer from further amplification bias.


Biological methods


ADAR knockdown

The modification of A to I is effected by adenosine deaminases that act on RNA (ADARs), of which in mice there are three. The knockdown of these in the cell, therefore, and the subsequent cell–cell comparison of ADAR+ and ADAR- RNA content would be anticipated to provide a basis for A-to-I modification profiling. However, there are further functions of ADAR enzymes within the cell — for example, they have further roles in RNA processing, and in miRNA biogenesis — which would also be likely to change the landscape of cellular mRNA. Recently a map of A-to-I editing in mice was generated using editing-deficient ADAR1 and ADAR2 double-knockout mice as a negative control. Thereby, A-to-I editing was detected with high confidence.


Methods for Pseudouridine Methylation Profiling

Pseudouridine Pseudouridine (abbreviated by the Greek letter psi- Ψ) is an isomer of the nucleoside uridine in which the uracil is attached via a carbon-carbon instead of a nitrogen-carbon glycosidic bond. (In this configuration, uracil is sometimes referred ...
, or Ψ, the overall most abundant post-translational RNA modification, is created when a uridine base is isomerised. In
eukaryote Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bact ...
s, this can occur by either of two distinct mechanisms; it is sometimes referred to as the ‘fifth RNA nucleotide’. It is incorporated into stable
non-coding RNA A non-coding RNA (ncRNA) is a functional RNA molecule that is not translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally important types of non- ...
s such as tRNA, rRNA, and snRNA, with roles in
ribosomal Ribosomes ( ) are macromolecular machines, found within all cells, that perform biological protein synthesis (mRNA translation). Ribosomes link amino acids together in the order specified by the codons of messenger RNA (mRNA) molecules to ...
ligand binding and translational fidelity in tRNA, and in fine-tuning branching events and splicing events in snRNAs. Pseudouridine has one more hydrogen bond donor from an imino group and a more stable C–C bond, since a C-glycosidic linkage has replaced the N-glycosidic linkage found in its counterpart (regular uridine). As neither of these changes affect its base-pairing properties, both will have the same output when directly sequenced; therefore methods for its detection involve prior biochemical modification.


Biochemical methods


CMCT methods

There are multiple pseudouridine detection methods beginning with the addition of N-cyclohexyl-N′-b-(4-methylmorpholinium) ethylcarbodiimide metho-p-toluene-sulfonate (CMCT; also known as CMC), since its reaction with pseudouridine produces CMC-Ψ. CMC-Ψ causes reverse transcriptase to stall one nucleotide in the 3’ direction. These methods have single-nucleotide resolution. In an optimisation step, azido-CMC can confer the ability to add biotinylation; subsequent biotin pulldown will enrich Ψ-containing transcripts, allowing identification of even low-abundance transcripts.


= Limitations

= As with other procedures predicated on biochemical alteration followed by sequencing, the development of
high-throughput sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The ...
has removed the limitations requiring prior knowledge of sites of interest and
primer Primer may refer to: Arts, entertainment, and media Films * ''Primer'' (film), a 2004 feature film written and directed by Shane Carruth * ''Primer'' (video), a documentary about the funk band Living Colour Literature * Primer (textbook), a te ...
design. The method causes a lot of RNA degradation, so it is necessary to start with a large amount of sample, or use effective normalisation techniques to account for amplification biases. One final limitation is that, for CMC labelling of pseudouridine to be specific, it is not complete, and therefore nor is it quantitative. A new reactant that could achieve a higher sensitivity with specificity would be beneficial.


Methods for 5-hydroxylmethylcytidine Profiling

Cytidine residues, modified once to m5C (discussed above), can be further modified: either
oxidised Redox (reduction–oxidation, , ) is a type of chemical reaction in which the oxidation states of substrate change. Oxidation is the loss of electrons or an increase in the oxidation state, while reduction is the gain of electrons or a de ...
once for 5-hydroxylmethylcytidine (hm5C), or oxidised twice for 5-formylcytidine (f5C). Arising from the oxidative processing of m5C enacted in mammals by ten-eleven translocation (
TET Tet or TET may refer to: Vietnam *Tết or Tết Nguyên Đán, the Vietnamese new year, Lunar new year *Tet Offensive, a military campaign during the Vietnam War that began in 1968 ** Tet 1969 Geography *Têt (river) in Roussillon, France *Tét ...
) family enzymes, hm5C is known to occur in all three
kingdom Kingdom commonly refers to: * A monarchy ruled by a king or queen * Kingdom (biology), a category in biological taxonomy Kingdom may also refer to: Arts and media Television * ''Kingdom'' (British TV series), a 2007 British television drama s ...
s and to have roles in regulation. While 5-hydroxymethylcytidine (hm5dC) is known to be found in DNA in a widespread manner, hm5C is also found in organisms for which no hm5dC has been detected, indicating it is a separate process with distinct regulatory stipulations. To observe the ''in vivo'' addition of methyl groups to cytosine RNA residues followed by oxidative processing,
mice A mouse ( : mice) is a small rodent. Characteristically, mice are known to have a pointed snout, small rounded ears, a body-length scaly tail, and a high breeding rate. The best known mouse species is the common house mouse (''Mus musculus' ...
can be fed on a diet incorporating particular
isotope Isotopes are two or more types of atoms that have the same atomic number (number of protons in their nuclei) and position in the periodic table (and hence belong to the same chemical element), and that differ in nucleon numbers ( mass number ...
s and these be traced by LC-
MS/MS Tandem mass spectrometry, also known as MS/MS or MS2, is a technique in instrumental analysis where two or more mass analyzers are coupled together using an additional reaction step to increase their abilities to analyse chemical samples. A comm ...
analysis. Since the metabolic pathway from nutritional intake to nucleotide incorporation is known to progress from dietary methionine -->
S-adenosylmethionine ''S''-Adenosyl methionine (SAM), also known under the commercial names of SAMe, SAM-e, or AdoMet, is a common cosubstrate involved in methyl group transfers, transsulfuration, and aminopropylation. Although these anabolic reactions occur throu ...
(SAM) --> methyl group on RNA base, the labelling of dietary methionine with 13C and D means these will end up in hm5C residues that have been altered since the addition of these into the diet. In contrast to m5C, a large quantity of hm5C modifications have been recorded within coding sequences.


hMeRIP-seq

hMeRIP-seq is an immunoprecipitation method, in which RNA–protein complexes are crosslinked for stability, and antibodies specific to hm5C are added. Using this method, over 3,000 hm5C peaks have been called in ''
Drosophila melanogaster ''Drosophila melanogaster'' is a species of fly (the taxonomic order Diptera) in the family Drosophilidae. The species is often referred to as the fruit fly or lesser fruit fly, or less commonly the " vinegar fly" or " pomace fly". Starting with ...
'' S2 cells.


Limitations

Despite two distinct base-resolution methods being available for hm5dC, there are no base-resolution methods for detection of hm5C.


Biophysical validation of RNA modifications

Apart from
mass spectrometry Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a '' mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is u ...
and
chromatography In chemical analysis, chromatography is a laboratory technique for the separation of a mixture into its components. The mixture is dissolved in a fluid solvent (gas or liquid) called the ''mobile phase'', which carries it through a system ( ...
, other two validation techniques have been developed, namely #Pre- and post-labelling techniques: *''Pre-labelling'' → involves the use of 32P: cells are grown in 32P containing medium, thus allowing the incorporation of �-32PTPs during transcription by T7 RNA polymerase. The modified RNA is then extracted, and each RNA species is isolated and subsequently digested by T2 RNase. Next, RNA is hydrolyzed into 5' nucleoside monophosphates, which are analyzed 2D-TLC (two-dimensional thin-layer chromatography). This method is able to detect and quantify every modification but will not contribute to the characterization of the sequence. *''Post-labelling'' → implicates the selective labelling of a specific position within the sequence: these techniques rely on the Stanley-Vassilenko approach principles, that has been adjusted to achieve a better validation quality. First, RNA is cleaved into free 5’-OH fragments either by RNase H or DNAzymes, by sequence specific hydrolysis. The polynucleotide kinase (PKN) then performs the 5’ radioactive post-labelling phosphorylation using �-32PTP. At this point, the labelled fragments undergo a size fragmentation, that can be performed either by Nuclease P1 or according to the SCARLET method. In both cases, the final product is a group of 5’ nucleoside monophosphates (5’ NMPs) that will be analyzed by TLC. **SCARLET: this recent approach exploits not just one, but two sequence selection steps, the last of which is obtained during the splinted ligation of the radioactive-labelled fragments with a long DNA oligonucleotide, at its 3’-end. After degradation, the labelled residue is purified together with the ligated DNA oligonucleotide and finally hydrolyzed and therefore released thanks to the activity of the Nuclease P1. This method has proven to be very useful in the validation of modified residues in
mRNAs In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the p ...
and
lncRNAs Long non-coding RNAs (long ncRNAs, lncRNA) are a type of RNA, generally defined as transcripts more than 200 nucleotides that are not translated into protein. This arbitrary limit distinguishes long ncRNAs from small non-coding RNAs, such as ...
, such as m6A and Ψ #Oligonucleotide-based techniques: this method includes several variants *''Splinted ligation of particular modified DNAs'', that exploits the ligase sensitivity to 3’ and 5’ nucleotides (so far used for m6A, 2’-O-Me, Ψ) *''Microarray modification identification through a DNA-chip'', that exploits the decrease in duplex stability of cDNA oligonucleotides, due to the impediment in conventional base-pairing caused by modifications (ex. m1A, m1G, m22G) *''RT primer extension at low dNTPs concentration'', for mapping of RT arrest signals.


Single-Molecule Real-Time Sequencing for epitranscriptome sequencing

Single-molecule real-time sequencing Single-molecule real-time (SMRT) sequencing is a parallelized single molecule DNA sequencing method. Single-molecule real-time sequencing utilizes a zero-mode waveguide (ZMW). A single DNA polymerase enzyme is affixed at the bottom of a ZMW with ...
(SMRT) is used in the epigenomic and epitranscriptomic fields. As regards
epigenomics Epigenomics is the study of the complete set of epigenetic modifications on the genetic material of a cell, known as the epigenome. The field is analogous to genomics and proteomics, which are the study of the genome and proteome of a cell. Epigen ...
, thousands of
zero-mode waveguide zero-mode waveguide is an optical waveguide that guides light energy into a volume that is small in all dimensions compared to the wavelength of the light. Zero-mode waveguides have been developed for rapid parallel sensing of zeptolitre sample ...
s (ZMWs) are used to capture the
DNA polymerase A DNA polymerase is a member of a family of enzymes that catalyze the synthesis of DNA molecules from nucleoside triphosphates, the molecular precursors of DNA. These enzymes are essential for DNA replication and usually work in groups to create ...
: when a modified base is present, the biophysical dynamics of its movement changes, creating a unique kinetic signature before, during, and after the base incorporation. SMRT sequencing can be used to detect modified bases in RNA, including m6A sites. In this case, a reverse transcriptase is used as enzyme with ZMWs to observe the cDNA synthesis in real time. The incorporation of synthetically designed m6A sites leaves a kinetic signature and increases the interpulse duration (IPD). There are some issues concerning the reading of homonucleotide stretches and the base resolution of m6A therein, due to the stuttering of reverse transcriptase. Secondly, the throughput is too low for transcriptome-wide approaches. One of the most commonly used platform is the SMRT sequencing technology by
Pacific Biosciences Pacific Biosciences of California, Inc. (aka PacBio) is an American biotechnology company founded in 2004 that develops and manufactures systems for gene sequencing and some novel real time biological observation. PacBio describes its platform ...
.


Nanopore sequencing in epitranscriptomics

A possible alternative to the detection of epitranscriptomic modifications by SMRT sequencing is the direct detection using the
Nanopore sequencing Nanopore sequencing is a third generation approach used in the sequencing of biopolymers — specifically, polynucleotides in the form of DNA or RNA. Using nanopore sequencing, a single molecule of DNA or RNA can be sequenced without the need ...
technologies. This technique exploits nanometer-sized protein channels embedded into a membrane or solid materials, and coupled to sensors, able to detect the amplitude and duration of the variations of the ionic current passing through the pore. As the RNA passes through the nanopore, the blockage leads to a disruption in current stream, which is different for the different bases, included modified ones, and therefore can be used to identify possible modifications. By producing single-molecule reads, without previous RNA amplification and conversion to cDNA, these techniques can lead to the production of quantitative transcriptome-wide maps. In particular, the Nanopore technology proved to be effective in detecting the presence of two nucleotide analogs in RNA: N6-methyladenosine (m6A) and 5-methylcytosine (5-mC). Using
Hidden Markov Models A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...
(HMM) or
recurrent neural networks A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes can create a cycle, allowing output from some nodes to affect subsequent input to the same nodes. This allows it to exhibit temporal dynamic ...
(RNN) trained with known sequences, it was possible to demonstrate that the modified nucleotides produce a characteristic disruption in the ionic current when passing through the pore, and that these data can be used to identify the nucleotide.


References

{{reflist, 35em RNA Nucleosides Bioinformatics Molecular biology