Transcriptome Fig 3
   HOME

TheInfoList



OR:

The transcriptome is the set of all
RNA Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
transcripts, including coding and
non-coding Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and regula ...
, in an individual or a population of
cells Cell most often refers to: * Cell (biology), the functional basic unit of life * Cellphone, a phone connected to a cellular network * Clandestine cell, a penetration-resistant form of a secret or outlawed organization * Electrochemical cell, a d ...
. The term can also sometimes be used to refer to all RNAs, or just
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein. mRNA is ...
, depending on the particular experiment. The term ''transcriptome'' is a portmanteau of the words ''transcript'' and ''genome''; it is associated with the process of transcript production during the biological process of
transcription Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including: Genetics * Transcription (biology), the copying of DNA into RNA, often th ...
. The early stages of transcriptome annotations began with
cDNA In genetics, complementary DNA (cDNA) is DNA that was reverse transcribed (via reverse transcriptase) from an RNA (e.g., messenger RNA or microRNA). cDNA exists in both single-stranded and double-stranded forms and in both natural and engin ...
libraries published in the 1980s. Subsequently, the advent of high-throughput technology led to faster and more efficient ways of obtaining data about the transcriptome. Two biological techniques are used to study the transcriptome, namely
DNA microarray A DNA microarray (also commonly known as a DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or t ...
, a hybridization-based technique and
RNA-seq RNA-Seq (named as an abbreviation of RNA sequencing) is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also k ...
, a sequence-based approach. RNA-seq is the preferred method and has been the dominant
transcriptomics technique Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA, RNA transcripts. The information content of an organism is recorded in the DNA of its genome and Gene expression, expressed throu ...
since the 2010s.
Single-cell transcriptomics Single-cell transcriptomics examines the gene expression level of individual Cell (biology), cells in a given population by simultaneously measuring the RNA concentration (conventionally only messenger RNA (mRNA)) of hundreds to thousands of genes. ...
allows tracking of transcript changes over time within individual cells. Data obtained from the transcriptome is used in research to gain insight into processes such as
cellular differentiation Cellular differentiation is the process in which a stem cell changes from one type to a differentiated one. Usually, the cell changes to a more specialized type. Differentiation happens multiple times during the development of a multicellula ...
,
carcinogenesis Carcinogenesis, also called oncogenesis or tumorigenesis, is the formation of a cancer, whereby normal cell (biology), cells are malignant transformation, transformed into cancer cells. The process is characterized by changes at the cellular, G ...
,
transcription regulation In molecular biology and genetics, transcriptional regulation is the means by which a cell regulates the conversion of DNA to RNA ( transcription), thereby orchestrating gene activity. A single gene can be regulated in a range of ways, from alt ...
and biomarker discovery among others. Transcriptome-obtained data also finds applications in establishing
phylogenetic relationships A phylogenetic tree or phylogeny is a graphical representation which shows the evolutionary history between a set of species or taxa during a specific time.Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA. In o ...
during the process of evolution and in ''in vitro'' fertilization. The transcriptome is closely related to other
-ome Omics is the collective characterization and quantification of entire sets of biological molecules and the investigation of how they translate into the structure, function, and dynamics of an organism or group of organisms. The branches of scie ...
based biological fields of study; it is complementary to the
proteome A proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. P ...
and the
metabolome The metabolome refers to the complete set of small-molecule chemicals found within a biological sample. The biological sample can be a cell, a cellular organelle, an organ, a tissue, a tissue extract, a biofluid or an entire organism. The ...
and encompasses the translatome,
exome The exome is composed of all of the exons within the genome, the sequences which, when transcribed, remain within the mature RNA after introns are removed by RNA splicing. This includes untranslated regions of messenger RNA (mRNA), and coding re ...
, meiome and
thanatotranscriptome The thanatotranscriptome denotes all ribonucleic acid, RNA transcripts produced from the portions of the genome still active or awakened in the internal organs of a body following its death. It is relevant to the study of the biochemistry, micr ...
which can be seen as ome fields studying specific types of RNA transcripts. There are quantifiable and conserved relationships between the Transcriptome and other -omes, and Transcriptomics data can be used effectively to predict other molecular species, such as metabolites. There are numerous publicly available transcriptome databases.


Etymology and history

The word ''transcriptome'' is a
portmanteau In linguistics, a blend—also known as a blend word, lexical blend, or portmanteau—is a word formed by combining the meanings, and parts of the sounds, of two or more words together.
of the words ''transcript'' and ''genome''. It appeared along with other
neologism In linguistics, a neologism (; also known as a coinage) is any newly formed word, term, or phrase that has achieved popular or institutional recognition and is becoming accepted into mainstream language. Most definitively, a word can be considered ...
s formed using the suffixes ''-ome'' and ''-omics'' to denote all studies conducted on a genome-wide scale in the fields of life sciences and technology. As such, transcriptome and transcriptomics were one of the first words to emerge along with genome and proteome. The first study to present a case of a collection of a
cDNA In genetics, complementary DNA (cDNA) is DNA that was reverse transcribed (via reverse transcriptase) from an RNA (e.g., messenger RNA or microRNA). cDNA exists in both single-stranded and double-stranded forms and in both natural and engin ...
library for
silk moth Silk moth may refer to *Moths that produce silk in their larval stage, including many Saturniidae, Bombycidae, and Apatelodidae moths. *The moth superfamily Bombycoidea, in contrast to other silk-producing moth groups. *The moth family Bombycida ...
mRNA was published in 1979. The first seminal study to mention and investigate the transcriptome of an organism was published in 1997 and it described 60,633 transcripts expressed in ''
S. cerevisiae ''Saccharomyces cerevisiae'' () (brewer's yeast or baker's yeast) is a species of yeast (single-celled fungal microorganisms). The species has been instrumental in winemaking, baking, and brewing since ancient times. It is believed to have been o ...
'' using
serial analysis of gene expression Serial Analysis of Gene Expression (SAGE) is a transcriptomic technique used by molecular biologists to produce a snapshot of the messenger RNA population in a sample of interest in the form of small tags that correspond to fragments of those tr ...
(SAGE). With the rise of high-throughput technologies and
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
and the subsequent increased computational power, it became increasingly efficient and easy to characterize and analyze enormous amount of data. Attempts to characterize the transcriptome became more prominent with the advent of automated DNA sequencing during the 1980s. During the 1990s,
expressed sequence tag In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and were instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has pro ...
sequencing was used to identify genes and their fragments. This was followed by techniques such as serial analysis of gene expression (SAGE), cap analysis of gene expression (CAGE), and
massively parallel signature sequencing Massive parallel signature sequencing (MPSS) is a procedure that is used to identify and quantify mRNA transcripts, resulting in data similar to serial analysis of gene expression (SAGE), although it employs a series of biochemical and sequencing ...
(MPSS).


Transcription

The transcriptome encompasses all the
ribonucleic acid Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins ( messenger RNA). RNA and deoxyr ...
(RNA) transcripts present in a given organism or experimental sample. RNA is the main carrier of genetic information that is responsible for the process of converting
DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
into an organism's phenotype. A gene can give rise to a single-stranded
messenger RNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the ...
(mRNA) through a molecular process known as
transcription Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including: Genetics * Transcription (biology), the copying of DNA into RNA, often th ...
; this mRNA is complementary to the strand of DNA it originated from. The enzyme
RNA polymerase II RNA polymerase II (RNAP II and Pol II) is a Protein complex, multiprotein complex that Transcription (biology), transcribes DNA into precursors of messenger RNA (mRNA) and most small nuclear RNA (snRNA) and microRNA. It is one of the three RNA pol ...
attaches to the template DNA strand and catalyzes the addition of
ribonucleotide In biochemistry, a ribonucleotide is a nucleotide containing ribose as its pentose component. It is considered a molecular precursor of nucleic acids. Nucleotides are the basic building blocks of DNA and RNA. Ribonucleotides themselves are basic mo ...
s to the 3' end of the growing sequence of the mRNA transcript. In order to initiate its function, RNA polymerase II needs to recognize a promoter sequence, located upstream (5') of the gene. In eukaryotes, this process is mediated by
transcription factor In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription (genetics), transcription of genetics, genetic information from DNA to messenger RNA, by binding t ...
s, most notably
Transcription factor II D Transcription factor II D (TFIID) is one of several general transcription factors that make up the RNA polymerase II preinitiation complex. RNA polymerase II holoenzyme is a form of eukaryotic RNA polymerase II that is recruited to the promoters ...
(TFIID) which recognizes the
TATA box In molecular biology, the TATA box (also called the Goldberg–Hogness box) is a sequence of DNA found in the core promoter region of genes in archaea and eukaryotes. The bacterial homolog of the TATA box is called the Pribnow box which has a ...
and aids in the positioning of RNA polymerase at the appropriate start site. To finish the production of the RNA transcript,
termination Termination may refer to: Science *Termination (geomorphology), the period of time of relatively rapid change from cold, glacial conditions to warm interglacial condition *Termination factor, in genetics, part of the process of transcribing RNA ...
takes place usually several hundred nuclecotides away from the termination sequence and cleavage takes place. This process occurs in the nucleus of a cell along with
RNA processing Transcriptional modification or co-transcriptional modification is a set of biological processes common to most eukaryotic cells by which an RNA primary transcript is chemically altered following transcription from a gene to produce a mature, fu ...
by which mRNA molecules are
capped In sport, a cap is a player's appearance in a game at international level. The term dates from the practice in the United Kingdom of awarding a cap to every player in an international match of rugby football and association football. In the e ...
,
spliced Spliced may refer to: *Spliced, the result of rope splicing Rope splicing in ropework is the forming of a semi-permanent joint between two ropes or two parts of the same rope by partly untwisting and then interweaving their strands. Splices ca ...
and polyadenylated to increase their stability before being subsequently taken to the cytoplasm. The mRNA gives rise to proteins through the process of
translation Translation is the communication of the semantics, meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The English la ...
that takes place in
ribosome Ribosomes () are molecular machine, macromolecular machines, found within all cell (biology), cells, that perform Translation (biology), biological protein synthesis (messenger RNA translation). Ribosomes link amino acids together in the order s ...
s.


Types of RNA transcripts

Almost all functional transcripts are derived from known genes. The only exceptions are a small number of transcripts that might play a direct role in regulating gene expression near the prompters of known genes. (See
Enhancer RNA Enhancer RNAs (eRNAs) represent a class of relatively long non-coding RNA molecules (50-2000 nucleotides) transcribed from the DNA sequence of enhancer regions. They were first detected in 2010 through the use of genome-wide techniques such as RNA ...
.) Gene occupy most of prokaryotic genomes so most of their genomes are transcribed. Many eukaryotic genomes are very large and known genes may take up only a fraction of the genome. In mammals, for example, known genes only account for 40-50% of the genome. Nevertheless, identified transcripts often map to a much larger fraction of the genome suggesting that the transcriptome contains spurious transcripts that do not come from genes. Some of these transcripts are known to be non-functional because they map to transcribed pseudogenes or degenerative transposons and viruses. Others map to unidentified regions of the genome that may be junk DNA. Spurious transcription is very common in eukaryotes, especially those with large genomes that might contain a lot of
junk DNA Junk DNA (non-functional DNA) is a DNA sequence that has no known biological function. Most organisms have some junk DNA in their genomes—mostly pseudogenes and fragments of transposons and viruses—but it is possible that some organ ...
. Some scientists claim that if a transcript has not been assigned to a known gene then the default assumption must be that it is junk RNA until it has been shown to be functional. This would mean that much of the transcriptome in species with large genomes is probably junk RNA. (See
Non-coding RNA A non-coding RNA (ncRNA) is a functional RNA molecule that is not Translation (genetics), translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally imp ...
) The transcriptome includes the transcripts of protein-coding genes (mRNA plus introns) as well as the transcripts of non-coding genes (functional RNAs plus introns). *
Ribosomal RNA Ribosomal ribonucleic acid (rRNA) is a type of non-coding RNA which is the primary component of ribosomes, essential to all cells. rRNA is a ribozyme which carries out protein synthesis in ribosomes. Ribosomal RNA is transcribed from ribosomal ...
/rRNA: Usually the most abundant RNA in the transcriptome. *
Long non-coding RNA Long non-coding RNAs (long ncRNAs, lncRNA) are a type of RNA, generally defined as transcripts more than 200 nucleotides that are not translated into protein. This arbitrary limit distinguishes long ncRNAs from small non-coding RNAs, such as mic ...
/lncRNA: Non-coding RNA transcripts that are more than 200 nucleotides long. Members of this group comprise the largest fraction of the non-coding transcriptome other than introns. It is not known how many of these transcripts are functional and how many are junk RNA. *
transfer RNA Transfer ribonucleic acid (tRNA), formerly referred to as soluble ribonucleic acid (sRNA), is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes). In a cell, it provides the physical link between the gene ...
/tRNA *
micro RNA Micro ribonucleic acid (microRNA, miRNA, μRNA) are small, single-stranded, non-coding RNA molecules containing 21–23 nucleotides. Found in plants, animals, and even some viruses, miRNAs are involved in RNA silencing and post-transcri ...
/miRNA: 19-24 nucleotides (nt) long. Micro RNAs up- or downregulate expression levels of mRNAs by the process of
RNA interference RNA interference (RNAi) is a biological process in which RNA molecules are involved in sequence-specific suppression of gene expression by double-stranded RNA, through translational or transcriptional repression. Historically, RNAi was known by ...
at the post-transcriptional level. *
small interfering RNA Small interfering RNA (siRNA), sometimes known as short interfering RNA or silencing RNA, is a class of double-stranded RNA, double-stranded non-coding RNA, non-coding RNA, RNA molecules, typically 20–24 base pairs in length, similar to microR ...
/siRNA: 20-24 nt *
small nucleolar RNA In molecular biology, small nucleolar RNAs (snoRNAs) are a class of small RNA molecules that primarily guide chemical modifications of other RNAs, mainly ribosomal RNAs, transfer RNAs and small nuclear RNAs. There are two main classes of snoRNA, t ...
/snoRNA *
Piwi-interacting RNA Piwi-interacting RNA (piRNA) is the largest class of small non-coding RNA, non-coding RNA molecules expressed in animal cells. piRNAs form RNA-protein complexes through interactions with piwi-subfamily Argonaute proteins. These piRNA complexes are ...
/piRNA: 24-31 nt. They interact with Piwi proteins of the
Argonaute The Argonaute protein family, first discovered for its evolutionarily conserved stem cell function, plays a central role in RNA silencing processes as essential components of the RNA-induced silencing complex (RISC). RISC is responsible for the ...
family and have a function in targeting and cleaving
transposon A transposable element (TE), also transposon, or jumping gene, is a type of mobile genetic element, a nucleic acid sequence in DNA that can change its position within a genome. The discovery of mobile genetic elements earned Barbara McClinto ...
s. *
enhancer RNA Enhancer RNAs (eRNAs) represent a class of relatively long non-coding RNA molecules (50-2000 nucleotides) transcribed from the DNA sequence of enhancer regions. They were first detected in 2010 through the use of genome-wide techniques such as RNA ...
/eRNA:


Scope of study

In the human genome, all genes get transcribed into RNA because that's how the molecular gene is defined. (See
Gene In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
.) The transcriptome consists of coding regions of mRNA plus non-coding UTRs, introns, non-coding RNAs, and spurious non-functional transcripts. Several factors render the content of the transcriptome difficult to establish. These include
alternative splicing Alternative splicing, alternative RNA splicing, or differential splicing, is an alternative RNA splicing, splicing process during gene expression that allows a single gene to produce different splice variants. For example, some exons of a gene ma ...
,
RNA editing RNA editing (also RNA modification) is a molecular process through which some cells can make discrete changes to specific nucleotide sequences within an RNA molecule after it has been generated by RNA polymerase. It occurs in all living organisms ...
and alternative transcription among others. Additionally, transcriptome techniques are capable of capturing transcription occurring in a sample at a specific time point, although the content of the transcriptome can change during differentiation. The main aims of transcriptomics are the following: "catalogue all species of transcript, including mRNAs, non-coding RNAs and small RNAs; to determine the transcriptional structure of genes, in terms of their start sites, 5′ and 3′ ends, splicing patterns and other post-transcriptional modifications; and to quantify the changing expression levels of each transcript during development and under different conditions". The term can be applied to the total set of transcripts in a given
organism An organism is any life, living thing that functions as an individual. Such a definition raises more problems than it solves, not least because the concept of an individual is also difficult. Many criteria, few of them widely accepted, have be ...
, or to the specific subset of transcripts present in a particular cell type. Unlike the
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
, which is roughly fixed for a given cell line (excluding
mutation In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
s), the transcriptome can vary with external environmental conditions. Because it includes all mRNA transcripts in the cell, the transcriptome reflects the
gene In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s that are being actively expressed at any given time, with the exception of mRNA degradation phenomena such as
transcriptional attenuation In genetics, attenuation is a regulatory mechanism for some bacterial operons that results in premature termination of transcription. The canonical example of attenuation used in many introductory genetics textbooks, is ribosome-mediated attenua ...
. The study of
transcriptomics Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA, RNA transcripts. The information content of an organism is recorded in the DNA of its genome and Gene expression, expressed throu ...
, (which includes
expression profiling In the field of molecular biology, gene expression profiling is the measurement of the activity (the expression) of thousands of genes at once, to create a global picture of cellular function. These profiles can, for example, distinguish between ...
, splice variant analysis etc.), examines the expression level of RNAs in a given cell population, often focusing on mRNA, but sometimes including others such as tRNAs and sRNAs.


Methods of construction

Transcriptomics is the quantitative science that encompasses the assignment of a list of strings ("reads") to the object ("transcripts" in the genome). To calculate the expression strength, the density of reads corresponding to each object is counted. Initially, transcriptomes were analyzed and studied using
expressed sequence tags In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and were instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has proc ...
libraries and serial and cap analysis of gene expression (SAGE). Currently, the two main transcriptomics techniques include
DNA microarray A DNA microarray (also commonly known as a DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or t ...
s and
RNA-Seq RNA-Seq (named as an abbreviation of RNA sequencing) is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also k ...
. Both techniques require RNA isolation through
RNA extraction RNA extraction is the purification of RNA from biological samples. This procedure is complicated by the ubiquitous presence of ribonuclease enzymes in cells and tissues, which can rapidly degrade RNA. Several methods are used in molecular biology t ...
techniques, followed by its separation from other cellular components and enrichment of mRNA. There are two general methods of inferring transcriptome sequences. One approach maps sequence reads onto a reference genome, either of the organism itself (whose transcriptome is being studied) or of a closely related species. The other approach, ''de novo'' transcriptome assembly, uses software to infer transcripts directly from short sequence reads and is used in organisms with genomes that are not sequenced.


DNA microarrays

The first transcriptome studies were based on
microarray A microarray is a multiplex (assay), multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of biological interactions. It is a two-dimensional array on a Substrate (materials science), solid substrate—usu ...
techniques (also known as DNA chips). Microarrays consist of thin glass layers with spots on which
oligonucleotide Oligonucleotides are short DNA or RNA molecules, oligomers, that have a wide range of applications in genetic testing, Recombinant DNA, research, and Forensic DNA, forensics. Commonly made in the laboratory by Oligonucleotide synthesis, solid-phase ...
s, known as "probes" are arrayed; each spot contains a known DNA sequence. When performing microarray analyses, mRNA is collected from a control and an experimental sample, the latter usually representative of a disease. The RNA of interest is converted to cDNA to increase its stability and marked with
fluorophore A fluorophore (or fluorochrome, similarly to a chromophore) is a fluorescent chemical compound that can re-emit light upon light excitation. Fluorophores typically contain several combined aromatic groups, or planar or cyclic molecules with se ...
s of two colors, usually green and red, for the two groups. The cDNA is spread onto the surface of the microarray where it hybridizes with oligonucleotides on the chip and a laser is used to scan. The fluorescence intensity on each spot of the microarray corresponds to the level of gene expression and based on the color of the fluorophores selected, it can be determined which of the samples exhibits higher levels of the mRNA of interest. One microarray usually contains enough oligonucleotides to represent all known genes; however, data obtained using microarrays does not provide information about unknown genes. During the 2010s, microarrays were almost completely replaced by next-generation techniques that are based on DNA sequencing.


RNA sequencing

RNA sequencing is a
next-generation sequencing Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation ...
technology; as such it requires only a small amount of RNA and no previous knowledge of the genome. It allows for both qualitative and quantitative analysis of RNA transcripts, the former allowing discovery of new transcripts and the latter a measure of relative quantities for transcripts in a sample. The three main steps of sequencing transcriptomes of any biological samples include RNA purification, the synthesis of an RNA or cDNA library and sequencing the library. The RNA purification process is different for short and long RNAs. This step is usually followed by an assessment of RNA quality, with the purpose of avoiding contaminants such as DNA or technical contaminants related to sample processing. RNA quality is measured using UV spectrometry with an absorbance peak of 260 nm. RNA integrity can also be analyzed quantitatively comparing the ratio and intensity of 28S RNA to 18S RNA reported in the RNA Integrity Number (RIN) score. Since mRNA is the species of interest and it represents only 3% of its total content, the RNA sample should be treated to remove rRNA and tRNA and tissue-specific RNA transcripts. The step of library preparation with the aim of producing short cDNA fragments, begins with RNA fragmentation to transcripts in length between 50 and 300
base pair A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
s. Fragmentation can be enzymatic (RNA
endonuclease In molecular biology, endonucleases are enzymes that cleave the phosphodiester bond within a polynucleotide chain (namely DNA or RNA). Some, such as deoxyribonuclease I, cut DNA relatively nonspecifically (with regard to sequence), while man ...
s), chemical (trismagnesium salt buffer, chemical hydrolysis) or mechanical (
sonication image:Sonicator.jpg, A sonicator at the Weizmann Institute of Science during sonicationSonication is the act of applying sound energy to agitate particles in a sample, for various purposes such as the extraction of multiple compounds from plants, ...
, nebulisation).
Reverse transcription A reverse transcriptase (RT) is an enzyme used to convert RNA genome to DNA, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B virus, hepatitis B to replicate their genomes, by retrot ...
is used to convert the RNA templates into cDNA and three priming methods can be used to achieve it, including oligo-DT, using random primers or ligating special adaptor oligos.


Single-cell transcriptomics

Transcription can also be studied at the level of individual cells by
single-cell transcriptomics Single-cell transcriptomics examines the gene expression level of individual Cell (biology), cells in a given population by simultaneously measuring the RNA concentration (conventionally only messenger RNA (mRNA)) of hundreds to thousands of genes. ...
. Single-cell RNA sequencing (scRNA-seq) is a recently developed technique that allows the analysis of the transcriptome of single cells, including
bacteria Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
. With single-cell transcriptomics, subpopulations of cell types that constitute the tissue of interest are also taken into consideration. This approach allows to identify whether changes in experimental samples are due to phenotypic cellular changes as opposed to proliferation, with which a specific cell type might be overexpressed in the sample. Additionally, when assessing cellular progression through differentiation, average expression profiles are only able to order cells by time rather than their stage of development and are consequently unable to show trends in gene expression levels specific to certain stages. Single-cell trarnscriptomic techniques have been used to characterize rare cell populations such as circulating tumor cells, cancer stem cells in solid tumors, and
embryonic stem cells Embryonic stem cells (ESCs) are Cell potency#Pluripotency, pluripotent stem cells derived from the inner cell mass of a blastocyst, an early-stage pre-Implantation (human embryo), implantation embryo. Human embryos reach the blastocyst stage 4†...
(ESCs) in mammalian
blastocyst The blastocyst is a structure formed in the early embryonic development of mammals. It possesses an inner cell mass (ICM) also known as the ''embryoblast'' which subsequently forms the embryo, and an outer layer of trophoblast cells called the ...
s. Although there are no standardized techniques for single-cell transcriptomics, several steps need to be undertaken. The first step includes cell isolation, which can be performed using low- and high-throughput techniques. This is followed by a qPCR step and then single-cell RNAseq where the RNA of interest is converted into cDNA. Newer developments in single-cell transcriptomics allow for tissue and sub-cellular localization preservation through cryo-sectioning thin slices of tissues and sequencing the transcriptome in each slice. Another technique allows the visualization of single transcripts under a microscope while preserving the spatial information of each individual cell where they are expressed.


Analysis

A number of organism-specific transcriptome databases have been constructed and annotated to aid in the identification of genes that are differentially expressed in distinct cell populations.
RNA-seq RNA-Seq (named as an abbreviation of RNA sequencing) is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also k ...
is emerging (2013) as the method of choice for measuring transcriptomes of organisms, though the older technique of
DNA microarray A DNA microarray (also commonly known as a DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or t ...
s is still used. RNA-seq measures the transcription of a specific gene by converting long RNAs into a library of
cDNA In genetics, complementary DNA (cDNA) is DNA that was reverse transcribed (via reverse transcriptase) from an RNA (e.g., messenger RNA or microRNA). cDNA exists in both single-stranded and double-stranded forms and in both natural and engin ...
fragments. The cDNA fragments are then sequenced using high-throughput sequencing technology and aligned to a reference genome or transcriptome which is then used to create an expression profile of the genes.


Applications


Mammals

The transcriptomes of
stem cell In multicellular organisms, stem cells are undifferentiated or partially differentiated cells that can change into various types of cells and proliferate indefinitely to produce more of the same stem cell. They are the earliest type of cell ...
s and
cancer Cancer is a group of diseases involving Cell growth#Disorders, abnormal cell growth with the potential to Invasion (cancer), invade or Metastasis, spread to other parts of the body. These contrast with benign tumors, which do not spread. Po ...
cells are of particular interest to researchers who seek to understand the processes of
cellular differentiation Cellular differentiation is the process in which a stem cell changes from one type to a differentiated one. Usually, the cell changes to a more specialized type. Differentiation happens multiple times during the development of a multicellula ...
and
carcinogenesis Carcinogenesis, also called oncogenesis or tumorigenesis, is the formation of a cancer, whereby normal cell (biology), cells are malignant transformation, transformed into cancer cells. The process is characterized by changes at the cellular, G ...
. A pipeline using RNA-seq or gene array data can be used to track genetic changes occurring in
stem Stem or STEM most commonly refers to: * Plant stem, a structural axis of a vascular plant * Stem group * Science, technology, engineering, and mathematics Stem or STEM can also refer to: Language and writing * Word stem, part of a word respon ...
and
precursor cells In cell biology, precursor cells—also called blast cells—are partially differentiated, or intermediate, and are sometimes referred to as progenitor cells. A precursor cell is a stem cell with the capacity to differentiate into only one cell ...
and requires at least three independent gene expression data from the former cell type and mature cells. Analysis of the transcriptomes of human
oocyte An oocyte (, oöcyte, or ovocyte) is a female gametocyte or germ cell involved in reproduction. In other words, it is an immature ovum, or egg cell. An oocyte is produced in a female fetus in the ovary during female gametogenesis. The female ger ...
s and
embryos An embryo ( ) is the initial stage of development for a multicellular organism. In organisms that reproduce sexually, embryonic development is the part of the life cycle that begins just after fertilization of the female egg cell by the male spe ...
is used to understand the molecular mechanisms and signaling pathways controlling early embryonic development, and could theoretically be a powerful tool in making proper
embryo selection In vitro fertilisation (IVF) is a process of fertilisation in which an egg is combined with sperm in vitro ("in glass"). The process involves monitoring and stimulating the ovulatory process, then removing an ovum or ova (egg or eggs) from t ...
in
in vitro fertilisation In vitro fertilisation (IVF) is a process of fertilisation in which an ovum, egg is combined with spermatozoon, sperm in vitro ("in glass"). The process involves monitoring and stimulating the Ovulation cycle, ovulatory process, then removing ...
. Analyses of the transcriptome content of the placenta in the first-trimester of pregnancy in ''in vitro'' fertilization and embryo transfer (IVT-ET) revealed differences in genetic expression which are associated with higher frequency of adverse perinatal outcomes. Such insight can be used to optimize the practice. Transcriptome analyses can also be used to optimize cryopreservation of oocytes, by lowering injuries associated with the process. Transcriptomics is an emerging and continually growing field in
biomarker In biomedical contexts, a biomarker, or biological marker, is a measurable indicator of some biological state or condition. Biomarkers are often measured and evaluated using blood, urine, or soft tissues to examine normal biological processes, ...
discovery for use in assessing the safety of drugs or chemical
risk assessment Risk assessment is a process for identifying hazards, potential (future) events which may negatively impact on individuals, assets, and/or the environment because of those hazards, their likelihood and consequences, and actions which can mitigate ...
. Transcriptomes may also be used to infer phylogenetic relationships among individuals or to detect evolutionary patterns of transcriptome conservation. Transcriptome analyses were used to discover the incidence of antisense transcription, their role in gene expression through interaction with surrounding genes and their abundance in different chromosomes. RNA-seq was also used to show how RNA isoforms, transcripts stemming from the same gene but with different structures, can produce complex phenotypes from limited genomes.


Plants

Transcriptome analysis have been used to study the
evolution Evolution is the change in the heritable Phenotypic trait, characteristics of biological populations over successive generations. It occurs when evolutionary processes such as natural selection and genetic drift act on genetic variation, re ...
and diversification process of plant species. In 2014, the 1000 Plant Genomes Project was completed in which the transcriptomes of 1,124 plant species from the families
viridiplantae Viridiplantae (; kingdom Plantae '' sensu stricto'') is a clade of around 450,000–500,000 species of eukaryotic organisms, most of which obtain their energy by photosynthesis. The green plants are chloroplast-bearing autotrophs that play impo ...
,
glaucophyta The glaucophytes, also known as glaucocystophytes or glaucocystids, are a small group of unicellular algae found in freshwater and moist terrestrial environments, less common today than they were during the Proterozoic. The stated number of spec ...
and
rhodophyta Red algae, or Rhodophyta (, ; ), make up one of the oldest groups of eukaryotic algae. The Rhodophyta comprises one of the largest phyla of algae, containing over 7,000 recognized species within over 900 genera amidst ongoing taxonomic revisions. ...
were sequenced. The protein coding sequences were subsequently compared to infer phylogenetic relationships between plants and to characterize the time of their diversification in the process of evolution. Transcriptome studies have been used to characterize and quantify gene expression in mature
pollen Pollen is a powdery substance produced by most types of flowers of seed plants for the purpose of sexual reproduction. It consists of pollen grains (highly reduced Gametophyte#Heterospory, microgametophytes), which produce male gametes (sperm ...
. Genes involved in cell wall metabolism and cytoskeleton were found to be overexpressed. Transcriptome approaches also allowed to track changes in gene expression through different developmental stages of pollen, ranging from microspore to mature pollen grains; additionally such stages could be compared across species of different plants including ''
Arabidopsis ''Arabidopsis'' (rockcress) is a genus in the family Brassicaceae. They are small flowering plants related to cabbage and mustard. This genus is of great interest since it contains thale cress (''Arabidopsis thaliana''), one of the model organ ...
'',
rice Rice is a cereal grain and in its Domestication, domesticated form is the staple food of over half of the world's population, particularly in Asia and Africa. Rice is the seed of the grass species ''Oryza sativa'' (Asian rice)—or, much l ...
and
tobacco Tobacco is the common name of several plants in the genus '' Nicotiana'' of the family Solanaceae, and the general term for any product prepared from the cured leaves of these plants. More than 70 species of tobacco are known, but the ...
.


Relation to other ome fields

Similar to other
-ome Omics is the collective characterization and quantification of entire sets of biological molecules and the investigation of how they translate into the structure, function, and dynamics of an organism or group of organisms. The branches of scie ...
based technologies, analysis of the transcriptome allows for an unbiased approach when validating hypotheses experimentally. This approach also allows for the discovery of novel mediators in signaling pathways. As with other -omics based technologies, the transcriptome can be analyzed within the scope of a
multiomics Multiomics, multi-omics, integrative omics, "panomics" or "pan-omics" is a biological analysis approach in which the data consists of multiple " omes", such as the genome, epigenome, transcriptome, proteome, metabolome, exposome, and microbiome ...
approach. It is complementary to
metabolomics Metabolomics is the scientific study of chemical processes involving metabolites, the small molecule substrates, intermediates, and products of cell metabolism. Specifically, metabolomics is the "systematic study of the unique chemical fingerpri ...
but contrary to proteomics, a direct association between a transcript and
metabolite In biochemistry, a metabolite is an intermediate or end product of metabolism. The term is usually used for small molecules. Metabolites have various functions, including fuel, structure, signaling, stimulatory and inhibitory effects on enzymes, c ...
cannot be established. There are several -ome fields that can be seen as subcategories of the transcriptome. The
exome The exome is composed of all of the exons within the genome, the sequences which, when transcribed, remain within the mature RNA after introns are removed by RNA splicing. This includes untranslated regions of messenger RNA (mRNA), and coding re ...
differs from the transcriptome in that it includes only those RNA molecules found in a specified cell population, and usually includes the amount or concentration of each RNA molecule in addition to the molecular identities. Additionally, the transcritpome also differs from the translatome, which is the set of RNAs undergoing translation. The term meiome is used in
functional genomics Functional genomics is a field of molecular biology that attempts to describe gene (and protein) functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects (such as genome sequen ...
to describe the meiotic transcriptome or the set of RNA transcripts produced during the process of
meiosis Meiosis () is a special type of cell division of germ cells in sexually-reproducing organisms that produces the gametes, the sperm or egg cells. It involves two rounds of division that ultimately result in four cells, each with only one c ...
. Meiosis is a key feature of sexually reproducing
eukaryote The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
s, and involves the pairing of
homologous chromosome Homologous chromosomes or homologs are a set of one maternal and one paternal chromosome that pair up with each other inside a cell during meiosis. Homologs have the same genes in the same locus (genetics), loci, where they provide points along e ...
, synapse and recombination. Since meiosis in most organisms occurs in a short time period, meiotic transcript profiling is difficult due to the challenge of isolation (or enrichment) of meiotic cells (
meiocyte A meiocyte is a type of cell that differentiates into a gamete through the process of meiosis. Through meiosis, the diploid meiocyte divides into four genetically different haploid gametes.Libeau, P., Durandet, M., Granier, F., Marquis, C., Ber ...
s). As with transcriptome analyses, the meiome can be studied at a whole-genome level using large-scale transcriptomic techniques. The meiome has been well-characterized in mammal and yeast systems and somewhat less extensively characterized in plants. The
thanatotranscriptome The thanatotranscriptome denotes all ribonucleic acid, RNA transcripts produced from the portions of the genome still active or awakened in the internal organs of a body following its death. It is relevant to the study of the biochemistry, micr ...
consists of all RNA transcripts that continue to be expressed or that start getting re-expressed in internal organs of a dead body 24–48 hours following death. Some genes include those that are inhibited after
fetal development Prenatal development () involves the development of the embryo and of the fetus during a viviparous animal's gestation. Prenatal development starts with fertilization, in the germinal stage of embryonic development, and continues in fetal deve ...
. If the thanatotranscriptome is related to the process of programmed cell death (
apoptosis Apoptosis (from ) is a form of programmed cell death that occurs in multicellular organisms and in some eukaryotic, single-celled microorganisms such as yeast. Biochemistry, Biochemical events lead to characteristic cell changes (Morphology (biol ...
), it can be referred to as the apoptotic thanatotranscriptome. Analyses of the thanatotranscriptome are used in
forensic medicine Forensic medicine is a broad term used to describe a group of medical specialties which deal with the examination and diagnosis of individuals who have been injured by or who have died because of external or unnatural causes such as poisoning, assa ...
.
eQTL An expression quantitative trait locus (eQTL) is a type of quantitative trait locus (QTL), a genomic locus (region of DNA) that is associated with phenotypic variation for a specific, quantifiable trait. While the term QTL can refer to a wide ran ...
mapping can be used to complement genomics with transcriptomics; genetic variants at DNA level and gene expression measures at RNA level.


Relation to proteome

The transcriptome can be seen as a subset of the
proteome A proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. P ...
, that is, the entire set of proteins expressed by a genome. However, the analysis of relative mRNA expression levels can be complicated by the fact that relatively small changes in mRNA expression can produce large changes in the total amount of the corresponding protein present in the cell. One analysis method, known as
gene set enrichment analysis Gene set enrichment analysis (GSEA) (also called functional enrichment analysis or pathway enrichment analysis) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an a ...
, identifies coregulated gene networks rather than individual genes that are up- or down-regulated in different cell populations. Although microarray studies can reveal the relative amounts of different mRNAs in the cell, levels of mRNA are not directly proportional to the expression level of the
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
s they code for. The number of protein molecules synthesized using a given mRNA molecule as a template is highly dependent on translation-initiation features of the mRNA sequence; in particular, the ability of the translation initiation sequence is a key determinant in the recruiting of
ribosome Ribosomes () are molecular machine, macromolecular machines, found within all cell (biology), cells, that perform Translation (biology), biological protein synthesis (messenger RNA translation). Ribosomes link amino acids together in the order s ...
s for protein
translation Translation is the communication of the semantics, meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The English la ...
.


Transcriptome databases

*Ensembl

*OmicTools

*Transcriptome Browser

*ArrayExpress


See also


Notes


References

*


Further reading

* Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. ''Proc Natl Acad Sci USA'' 102(43):15545-50. * Laule O, Hirsch-Hoffmann M, Hruz T, Gruissem W, and P Zimmermann. (2006) Web-based analysis of the mouse transcriptome using Genevestigator. ''BMC Bioinformatics'' 7:311 * * {{Genomics Gene expression Omics RNA RNA splicing