Multiple Expectation maximizations for Motif Elicitation (MEME) is a tool for discovering motifs in a group of related
DNA or
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respon ...
sequences.
[Bailey T.L., Elkan C. Unsupervised Learning of Multiple Motifs In Biopolymers Using EM. Mach. Learn. 1995;21:51–80.]
A
motif
Motif may refer to:
General concepts
* Motif (chess composition), an element of a move in the consideration of its purpose
* Motif (folkloristics), a recurring element that creates recognizable patterns in folklore and folk-art traditions
* Moti ...
is a sequence pattern that occurs repeatedly in a group of related protein or DNA sequences and is often associated with some biological function. MEME represents motifs as
position-dependent letter-probability matrices which describe the probability of each possible letter at each position in the pattern. Individual MEME motifs do not contain gaps. Patterns with variable-length gaps are split by MEME into two or more separate motifs.
MEME takes as input a group of DNA or protein sequences (the training set) and outputs as many motifs as requested. It uses statistical modeling techniques to automatically choose the best width, number of occurrences, and description for each motif.
MEME is the first of a collection of tools for analyzing motifs called the
MEME suite.
Definition
The MEME algorithm could be understood from two different perspectives. From a biological point of view, MEME identifies and characterizes shared motifs in a set of unaligned sequences. From the computer science aspect, MEME finds a set of non-overlapping, approximately matching substrings given a starting set of strings.
Use
MEME can be used to find similar biological functions and structures in different sequences. It is necessary to take into account that the sequences variation can be significant and that the motifs are sometimes very small. It is also useful to take into account that the binding sites for proteins are very specific. This makes it easier to reduce wet-lab experiments (saving cost and time). Indeed, to better discover the motifs relevant from a biological point it is necessary to carefully choose: the best width of motifs, the number of occurrences in each sequence, and the composition of each motif.
Algorithm components
The algorithm uses several types of well known functions:
*
Expectation maximization
Expectation or Expectations may refer to:
Science
* Expectation (epistemic)
* Expected value, in mathematical probability theory
* Expectation value (quantum mechanics)
* Expectation–maximization algorithm, in statistics
Music
* ''Expectation' ...
(EM).
* EM based heuristic for choosing the EM starting point.
*
Maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed sta ...
ratio based (LRT-based) heuristic for determining the best number of model-free parameters.
* Multi-start for searching over possible motif widths.
*
Greedy search
A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but a greedy heuristic can yield locally ...
for finding multiple motifs.
However, one often doesn't know where the starting position is. Several possibilities exist: exactly one motif per sequence, or one or zero motif per sequence, or any number of motifs per sequence.
See also
*
Sequence motif
In biology, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and usually assumed to be related to biological function of the macromolecule. For example, an ''N''-glycosylation site motif can be defined as '' ...
*
Sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Ali ...
References
{{reflist
External links
The MEME Suite— Motif-based sequence analysis tools
GPU Accelerated version of MEMEEXTREME— An online EM implementation of the MEME model for fast motif discovery in large ChIP-Seq and DNase-Seq Footprinting data
Bioinformatics