HOME

TheInfoList



OR:

SOAP (Short Oligonucleotide Analysis Package) is a suite of
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combin ...
software tools from the BGI Bioinformatics department enabling the assembly, alignment, and analysis of next generation DNA sequencing data. It is particularly suited to short read sequencing data. All programs in the SOAP package may be used free of charge and are distributed under the
GPL The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general us ...
open source software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Op ...
license.


Functionality

The SOAP suite of tools can be used to perform the following genome assembly tasks:


Sequence Alignment

''SOAPaligner'' (SOAP2) is specifically designed for fast alignment of short reads and performs favorably with respect to similar alignment tools such as
Bowtie The bow tie is a type of necktie. A modern bow tie is tied using a common shoelace knot, which is also called the bow knot for that reason. It consists of a ribbon of fabric tied around the collar of a shirt in a symmetrical manner so that ...
and MAQ.


Genome Assembly

''SOAPdenovo'' is a short read ''de novo'' assembler utilizing
De Bruijn graph In graph theory, an -dimensional De Bruijn graph of symbols is a directed graph representing overlaps between sequences of symbols. It has vertices, consisting of all possible sequences of the given symbols; the same symbol may appear multiple ...
construction. It is optimized for short reads such as that generated by Illumina and is capable of assembling large genomes such as the human genome. ''SOAPdenovo'' was used to assemble the genome of the
giant panda The giant panda (''Ailuropoda melanoleuca''), also known as the panda bear (or simply the panda), is a bear species endemic to China. It is characterised by its bold black-and-white coat and rotund body. The name "giant panda" is sometimes u ...
. This was upgraded to ''SOAPdenovo2,'' which was optimized for large genomes and included the widely used GapCloser module.


Transcriptome Assembly

''SOAPdenovo-Trans'' is a ''de novo''
transcriptome The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The ...
assembler designed specifically for
RNA-Seq RNA-Seq (named as an abbreviation of RNA sequencing) is a sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing ...
that was created for the 1000 Plant Genomes project.


Indel Discovery

''SOAPindel'' is a tool to find insertions and deletions from next generation paired-end sequencing data, providing a list of candidate
indel Indel is a molecular biology term for an insertion or deletion of bases in the genome of an organism. It is classified among small genetic variations, measuring from 1 to 10 000 base pairs in length, including insertion and deletion events that ...
s with quality scores.


SNP Discovery

''SOAPsnp'' is a consensus sequence builder. This tool uses the output from ''SOAPaligner'' to generate a consensus sequence which enables
SNPs In genetics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently larg ...
to be called on a newly sequenced individual.


Structural Variation Discovery

''SOAPsv'' is a tool to find structural variations using whole genome assembly.


Quality control and preprocessing

''SOAPnuke'' is a tool for integrated quality control and preprocessing of datasets from genomic,
small RNA Small RNA (sRNA) are polymeric RNA molecules that are less than 200 nucleotides in length, and are usually non-coding. RNA silencing is often a function of these molecules, with the most common and well-studied example being RNA interference (RNA ...
, Digital Gene Expression, and
metagenomic Metagenomics is the study of genetic material recovered directly from environmental or clinical samples by a method called sequencing. The broad field may also be referred to as environmental genomics, ecogenomics, community genomics or micro ...
experiments.


History


SOAP v1

The first release of SOAP consisted only of the
sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Ali ...
tool ''SOAPaligner''.


SOAP v2

SOAP v2 extended and improved on SOAP v1 by significantly improving the performance of the ''SOAPaligner'' tool. Alignment time was reduced by a factor of 20-30, while memory usage was reduced by a factor of 3. Support was added for compressed file formats. The SOAP suite was expanded then to include the new tools: SOAPdenovo 1&2, SOAPindel, SOAPsnp, and SOAPsv.


SOAP v3

SOAP v3 extended the alignment tool by being the first short-read alignment tool to utilize GPU processors. As a result of these improvements, SOAPalign significantly outperformed competing aligners
Bowtie The bow tie is a type of necktie. A modern bow tie is tied using a common shoelace knot, which is also called the bow knot for that reason. It consists of a ribbon of fabric tied around the collar of a shirt in a symmetrical manner so that ...
and BWA in terms of speed.


See also

*
genomics Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
*
genome sequencing Whole genome sequencing (WGS), also known as full genome sequencing, complete genome sequencing, or entire genome sequencing, is the process of determining the entirety, or nearly the entirety, of the DNA sequence of an organism's genome at a ...
*
genome assembly In bioinformatics, sequence assembly refers to aligning and merging fragments from a longer DNA sequence in order to reconstruct the original sequence. This is needed as DNA sequencing technology might not be able to 'read' whole genomes in one ...
*
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combin ...


External links

* http://soap.genomics.org.cn * http://soap.genomics.org.cn/soap1 * http://bioinformatics.genomics.org.cn * http://seqanswers.com/forums/showthread.php?t=43


References

{{Reflist Bioinformatics algorithms Bioinformatics software DNA sequencing Free software projects Short Oligonucleotide Analysis Package