TopHat (bioinformatics)
   HOME

TheInfoList



OR:

TopHat is an open-source
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
tool for the throughput alignment of shotgun cDNA sequencing reads generated by
transcriptomics technologies Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. H ...
(e.g.
RNA-Seq RNA-Seq (named as an abbreviation of RNA sequencing) is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also k ...
) using
Bowtie The bow tie or dicky bow is a type of neckwear, distinguishable from a necktie because it does not drape down the shirt placket, but is tied just underneath a winged collar. A modern bow tie is tied using a common shoelace knot, which is also ...
first and then mapping to a
reference genome A reference genome (also known as a reference assembly) is a digital nucleic acid sequence database, assembled by scientists as a representative example of the genome, set of genes in one idealized individual organism of a species. As they are a ...
to discover RNA splice sites ''de novo''. TopHat aligns RNA-Seq reads to mammalian-sized genomes.


History

TopHat was originally developed in 2009 by Cole Trapnell,
Lior Pachter Lior Samuel Pachter () is a computational biologist. He works at the California Institute of Technology, where he is the Bren Professor of Computational Biology. He has widely varied research interests including genomics, combinatorics, computa ...
and
Steven Salzberg Steven Lloyd Salzberg (born 1960) is an American computational biologist and computer scientist who is a Bloomberg Distinguished Professor of Biomedical Engineering, Computer Science, and Biostatistics at Johns Hopkins University, where he is al ...
at the Center for Bioinformatics and Computational Biology at the
University of Maryland, College Park The University of Maryland, College Park (University of Maryland, UMD, or simply Maryland) is a public university, public Land-grant university, land-grant research university in College Park, Maryland, United States. Founded in 1856, UMD i ...
and at the Mathematics Department,
UC Berkeley The University of California, Berkeley (UC Berkeley, Berkeley, Cal, or California), is a public land-grant research university in Berkeley, California, United States. Founded in 1868 and named after the Anglo-Irish philosopher George Berkele ...
. TopHat2 was a collaborative effort of Daehwan Kim and Steven Salzberg, initially at the
University of Maryland, College Park The University of Maryland, College Park (University of Maryland, UMD, or simply Maryland) is a public university, public Land-grant university, land-grant research university in College Park, Maryland, United States. Founded in 1856, UMD i ...
and later at the Center for Computational Biology at
Johns Hopkins University The Johns Hopkins University (often abbreviated as Johns Hopkins, Hopkins, or JHU) is a private university, private research university in Baltimore, Maryland, United States. Founded in 1876 based on the European research institution model, J ...
. Kim re-wrote some of Trapnell's original TopHat code in C++ to make it much faster, and added many heuristics to improve its accuracy, in a collaboration with Cole Trapnell and others. Kim and Salzberg also developed TopHat-fusion which used
transcriptome The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The ...
data to discover gene fusions in cancer tissues.


Uses

TopHat is used to align reads from an RNA-Seq experiment. It is a read-mapping algorithm and it aligns the reads to a reference genome. It is useful because it does not need to rely on known splice sites. TopHat can be used with the Tuxedo pipeline, and is frequently used with
Bowtie The bow tie or dicky bow is a type of neckwear, distinguishable from a necktie because it does not drape down the shirt placket, but is tied just underneath a winged collar. A modern bow tie is tied using a common shoelace knot, which is also ...
.


Advantages/Disadvantages


Advantages

When TopHat first came out, it was faster than previous systems. It mapped more than 2.2 million reads per CPU hour. That speed allowed the user to process and entire RNA-Seq experiment in less than a day, even on a standard desktop computer. Tophat uses Bowtie in the beginning to analyze the reads, but then does more to analyze the reads that span exon-exon junctions. If you are using TopHat for RNA-Seq data, you will get more read aligned against the reference genome. Another advantage for TopHat is that it does not need to rely on known splice sites when aligning reads to a reference genome.


Disadvantages

TopHat is in a low maintenance, low support stage, and contains software bugs that have spawned 3rd party post-processing software to correct. It has been superseded by HISAT2, which is more efficient and accurate and provides the same core functionality (spliced alignment of RNA-Seq reads).


See also

*
Bowtie (sequence analysis) Bowtie is a software package commonly used for sequence alignment and sequence analysis in bioinformatics. The source code for the package is distributed freely and compiled binaries are available for Linux, macOS and Windows platforms. As of 201 ...
* List of RNA-Seq bioinformatics tools *
Microarray analysis techniques Microarray analysis techniques are used in interpreting the data generated from experiments on DNA (Gene chip analysis), RNA, and protein microarrays, which allow researchers to investigate the expression state of a large number of genesin many cas ...
*
next generation sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, thymine, cytosine, and guanine. The ...
*
RNA-Seq RNA-Seq (named as an abbreviation of RNA sequencing) is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also k ...


References


External links


TopHat page on Center for Computational Biology at JHU
{{Bioinformatics Bioinformatics algorithms Bioinformatics software Laboratory software Software using the Artistic license