Short Oligonucleotide Analysis Package
   HOME

TheInfoList



OR:

SOAP (Short Oligonucleotide Analysis Package) is a suite of
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
software tools from the BGI Bioinformatics department enabling the assembly, alignment, and analysis of next generation DNA sequencing data. It is particularly suited to short read sequencing data. All programs in the SOAP package may be used free of charge and are distributed under the
GPL The GNU General Public Licenses (GNU GPL or simply GPL) are a series of widely used free software licenses, or ''copyleft'' licenses, that guarantee end users the freedom to run, study, share, or modify the software. The GPL was the first c ...
open source software Open-source software (OSS) is Software, computer software that is released under a Open-source license, license in which the copyright holder grants users the rights to use, study, change, and Software distribution, distribute the software an ...
license.


Functionality

The SOAP suite of tools can be used to perform the following genome assembly tasks:


Sequence Alignment

''SOAPaligner'' (SOAP2) is specifically designed for fast alignment of short reads and performs favorably with respect to similar alignment tools such as
Bowtie The bow tie or dicky bow is a type of neckwear, distinguishable from a necktie because it does not drape down the shirt placket, but is tied just underneath a winged collar. A modern bow tie is tied using a common shoelace knot, which is also ...
and MAQ.


Genome Assembly

''SOAPdenovo'' is a short read ''de novo'' assembler utilizing
De Bruijn graph In graph theory, an -dimensional De Bruijn graph of symbols is a directed graph representing overlaps between sequences of symbols. It has vertices, consisting of all possible sequences of the given symbols; the same symbol may appear multiple ...
construction. It is optimized for short reads such as that generated by Illumina and is capable of assembling large genomes such as the human genome. ''SOAPdenovo'' was used to assemble the genome of the
giant panda The giant panda (''Ailuropoda melanoleuca''), also known as the panda bear or simply panda, is a bear species endemic to China. It is characterised by its white animal coat, coat with black patches around the eyes, ears, legs and shoulders. ...
. This was upgraded to ''SOAPdenovo2,'' which was optimized for large genomes and included the widely used GapCloser module.


Transcriptome Assembly

''SOAPdenovo-Trans'' is a ''de novo''
transcriptome The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The ...
assembler designed specifically for
RNA-Seq RNA-Seq (named as an abbreviation of RNA sequencing) is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also k ...
that was created for the 1000 Plant Genomes project.


Indel Discovery

''SOAPindel'' is a tool to find insertions and deletions from next generation paired-end sequencing data, providing a list of candidate
indel Indel (insertion-deletion) is a molecular biology term for an insertion or deletion of bases in the genome of an organism. Indels ≥ 50 bases in length are classified as structural variants. In coding regions of the genome, unless the lengt ...
s with quality scores.


SNP Discovery

''SOAPsnp'' is a consensus sequence builder. This tool uses the output from ''SOAPaligner'' to generate a consensus sequence which enables
SNPs In genetics and bioinformatics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in ...
to be called on a newly sequenced individual.


Structural Variation Discovery

''SOAPsv'' is a tool to find structural variations using whole genome assembly.


Quality control and preprocessing

''SOAPnuke'' is a tool for integrated quality control and preprocessing of datasets from genomic,
small RNA Small RNA (sRNA) are polymeric RNA molecules that are less than 200 nucleotides in length, and are usually non-coding RNA, non-coding. RNA silencing is often a function of these molecules, with the most common and well-studied example being RNA int ...
, Digital Gene Expression, and
metagenomic Metagenomics is the study of all genetic material from all organisms in a particular environment, providing insights into their composition, diversity, and functional potential. Metagenomics has allowed researchers to profile the microbial co ...
experiments.


History


SOAP v1

The first release of SOAP consisted only of the
sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural biology, structural, or evolutionary relationships between ...
tool ''SOAPaligner''.


SOAP v2

SOAP v2 extended and improved on SOAP v1 by significantly improving the performance of the ''SOAPaligner'' tool. Alignment time was reduced by a factor of 20-30, while memory usage was reduced by a factor of 3. Support was added for compressed file formats. The SOAP suite was expanded then to include the new tools: SOAPdenovo 1&2, SOAPindel, SOAPsnp, and SOAPsv.


SOAP v3

SOAP v3 extended the alignment tool by being the first short-read alignment tool to utilize GPU processors. As a result of these improvements, SOAPalign significantly outperformed competing aligners
Bowtie The bow tie or dicky bow is a type of neckwear, distinguishable from a necktie because it does not drape down the shirt placket, but is tied just underneath a winged collar. A modern bow tie is tied using a common shoelace knot, which is also ...
and BWA in terms of speed.


See also

*
genomics Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, ...
*
genome sequencing Whole genome sequencing (WGS), also known as full genome sequencing or just genome sequencing, is the process of determining the entirety of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's ...
*
genome assembly In bioinformatics, sequence assembly refers to aligning and merging fragments from a longer DNA sequence in order to reconstruct the original sequence. This is needed as DNA sequencing technology might not be able to 'read' whole genomes in one ...
*
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...


External links

* http://soap.genomics.org.cn * http://soap.genomics.org.cn/soap1 * http://bioinformatics.genomics.org.cn * http://seqanswers.com/forums/showthread.php?t=43


References

{{Bioinformatics Bioinformatics algorithms Bioinformatics software DNA sequencing Free software projects