DIMPL
   HOME

TheInfoList



OR:

DIMPL (Discovery of Intergenic Motifs PipeLine) is a
bioinformatic Bioinformatics () is an interdisciplinary field of science Science is a systematic discipline that builds and organises knowledge in the form of testable hypotheses and predictions about the universe. Modern science is typically divi ...
pipeline A pipeline is a system of Pipe (fluid conveyance), pipes for long-distance transportation of a liquid or gas, typically to a market area for consumption. The latest data from 2014 gives a total of slightly less than of pipeline in 120 countries ...
that enables the extraction and selection of
bacteria Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
l GC-rich
intergenic region An intergenic region is a stretch of DNA sequences located between genes. Intergenic regions may contain functional elements and junk DNA. Properties and functions Intergenic regions may contain a number of functional DNA sequences such as p ...
s (IGRs) that are enriched for structured
non-coding RNA A non-coding RNA (ncRNA) is a functional RNA molecule that is not Translation (genetics), translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally imp ...
s (ncRNAs). The method of enriching bacterial IGRs for ncRNA motif discovery was first reported for a study in "Genome-wide discovery of structured noncoding RNAs in bacteria". DIMPL pipeline automates the process of total
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
analysis by extracting IGRs, filtering them by length and
nucleic acid Nucleic acids are large biomolecules that are crucial in all cells and viruses. They are composed of nucleotides, which are the monomer components: a pentose, 5-carbon sugar, a phosphate group and a nitrogenous base. The two main classes of nuclei ...
composition, and collecting the data necessary to identify candidate motifs and assign their possible functions. DIMPL pipeline provides reproducible techniques for identifying genomic regions enriched for ncRNA through
support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised max-margin models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laborato ...
(SVM) classifiers. It can be used to look for nucleic acid and
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
motifs, including
riboswitch In molecular biology, a riboswitch is a regulatory segment of a messenger RNA molecule that binds a small molecule, resulting in a change in Translation (biology), production of the proteins encoded by the mRNA. Thus, an mRNA that contains a ribo ...
-like elements, upstream
open reading frame In molecular biology, reading frames are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible reading frames ...
s (uORFs), short open reading frames (sORFs), ribosomal protein leader sequences,
selfish genetic element Selfish genetic elements (historically also referred to as selfish genes, ultra-selfish genes, selfish DNA, parasitic DNA and genomic outlaws) are genetic segments that can enhance their own transmission at the expense of other genes in the genome, ...
s and other structured RNA motifs of unknown function. DIMPL uses various sequence analysis resources, including: *
Rfam Rfam is a database containing information about non-coding RNA (ncRNA) families and other structured RNA elements. It is an annotated, open access database originally developed at the Wellcome Trust Sanger Institute in collaboration with Janel ...
database, as a reference of known RNA families * BLASTX search tool, to eliminate unannotated protein coding regions * INFERNAL package, to search the IGSs sequences * CMfinder, to look for possible RNA secondary structure features * R-scape software and R2R drawing algorithm, to generate the consensus model * RNAcode, to look for the presence of coding regions * GenomeView, to visualize the genetic context of the RNA motif RNA motifs discovered using DIMPL include HMP-PP riboswitch,
icd-II ncRNA motif The icd-II non-coding RNA (ncRNA) is an RNA motif proposed as a Strong Riboswitch Candidate (SRC). Icd-II ncRNA has been recognized by a comparative sequence analysis in GC-rich intergenic regions (IGR) of bacteria, using a pipeline call Discovery ...
,
carA ncRNA motif Cara or CARA may refer to: Places * Čara, a village on the island of Korčula, Croatia * Cara, a village in Cojocna Commune, Cluj County, Romania * Cara Island, off the west coast of Argyll, Scotland * Cara Paraná River, Colombia * Cara Sucia ...
, ldh2 ncRNA motif, among others.


References

{{Reflist Bioinformatics Computational biology