TopHat is an open-source
bioinformatics
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combin ...
tool for the throughput alignment of shotgun cDNA sequencing reads generated by
transcriptomics technologies (e.g.
RNA-Seq
RNA-Seq (named as an abbreviation of RNA sequencing) is a sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing ...
) using
Bowtie
The bow tie is a type of necktie. A modern bow tie is tied using a common shoelace knot, which is also called the bow knot for that reason. It consists of a ribbon of fabric tied around the collar of a shirt in a symmetrical manner so that ...
first and then mapping to a
reference genome
A reference genome (also known as a reference assembly) is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assemble ...
to discover RNA splice sites ''de novo''.
TopHat aligns RNA-Seq reads to mammalian-sized genomes.
History
TopHat was originally developed in 2009 by
Cole Trapnell,
Lior Pachter and
Steven Salzberg
Steven Lloyd Salzberg (born 1960) is an American computational biologist and computer scientist who is a Bloomberg Distinguished Professor of Biomedical Engineering, Computer Science, and Biostatistics at Johns Hopkins University, where he is als ...
at the Center for Bioinformatics and Computational Biology at the
University of Maryland, College Park
The University of Maryland, College Park (University of Maryland, UMD, or simply Maryland) is a public land-grant research university in College Park, Maryland. Founded in 1856, UMD is the flagship institution of the University System of ...
and at the Mathematics Department,
UC Berkeley
The University of California, Berkeley (UC Berkeley, Berkeley, Cal, or California) is a public university, public land-grant university, land-grant research university in Berkeley, California. Established in 1868 as the University of Californi ...
.
TopHat2 was a collaborative effort of Daehwan Kim and Steven Salzberg, initially at the
University of Maryland, College Park
The University of Maryland, College Park (University of Maryland, UMD, or simply Maryland) is a public land-grant research university in College Park, Maryland. Founded in 1856, UMD is the flagship institution of the University System of ...
and later at the Center for Computational Biology at
Johns Hopkins University
Johns Hopkins University (Johns Hopkins, Hopkins, or JHU) is a private research university in Baltimore, Maryland. Founded in 1876, Johns Hopkins is the oldest research university in the United States and in the western hemisphere. It consiste ...
. Kim re-wrote some of Trapnell's original TopHat code in
C++ to make it much faster, and added many heuristics to improve its accuracy, in a collaboration with Cole Trapnell and others. Kim and Salzberg also developed TopHat-fusion which used
transcriptome
The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The ...
data to discover gene fusions in cancer tissues.
Uses
TopHat is used to align reads from an RNA-Seq experiment. It is a read-mapping algorithm and it aligns the reads to a reference genome. It is useful because it does not need to rely on known splice sites.
TopHat can be used with the
Tuxedo
Black tie is a semi-formal Western dress code for evening events, originating in British and American conventions for attire in the 19th century. In British English, the dress code is often referred to synecdochically by its principal element ...
pipeline, and is frequently used with
Bowtie
The bow tie is a type of necktie. A modern bow tie is tied using a common shoelace knot, which is also called the bow knot for that reason. It consists of a ribbon of fabric tied around the collar of a shirt in a symmetrical manner so that ...
.
Advantages/Disadvantages
Advantages
When TopHat first came out, it was faster than previous systems. It mapped more than 2.2 million reads per CPU hour. That speed allowed the user to process and entire RNA-Seq experiment in less than a day, even on a standard desktop computer.
Tophat uses Bowtie in the beginning to analyze the reads, but then does more to analyze the reads that span exon-exon junctions. If you are using TopHat for RNA-Seq data, you will get more read aligned against the reference genome.
Another advantage for TopHat is that it does not need to rely on known splice sites when aligning reads to a reference genome.
Disadvantages
TopHat is in a low maintenance, low support stage, and contains software bugs that have spawned 3rd party post-processing software to correct.
It has been superseded by HISAT2, which is more efficient and accurate and provides the same core functionality (spliced alignment of RNA-Seq reads).
See also
*
Bowtie (sequence analysis)
Bowtie is a software package commonly used for sequence alignment and sequence analysis in bioinformatics. The source code for the package is distributed freely and compiled binaries are available for Linux, macOS and Windows platforms. As of ...
*
List of RNA-Seq bioinformatics tools
*
Microarray analysis techniques
Microarray analysis techniques are used in interpreting the data generated from experiments on DNA (Gene chip analysis), RNA, and protein microarrays, which allow researchers to investigate the expression state of a large number of genes - in many ...
*
next generation sequencing
DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The ...
*
RNA-Seq
RNA-Seq (named as an abbreviation of RNA sequencing) is a sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing ...
References
External links
TopHat page on Center for Computational Biology at JHU
{{Bioinformatics
Bioinformatics algorithms
Bioinformatics software
Laboratory software
Software using the Artistic license