HOME

TheInfoList



OR:

In the field of
computational biology Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has fo ...
, a planted motif search (PMS) also known as a (''l, d'')-motif search (LDMS) is a method for identifying conserved motifs within a set of
nucleic acid Nucleic acids are biopolymers, macromolecules, essential to all known forms of life. They are composed of nucleotides, which are the monomers made of three components: a 5-carbon sugar, a phosphate group and a nitrogenous base. The two main ...
or
peptide sequence Peptides (, ) are short chains of amino acids linked by peptide bonds. Long chains of amino acids are called proteins. Chains of fewer than twenty amino acids are called oligopeptides, and include dipeptides, tripeptides, and tetrapeptides. A ...
s. PMS is known to be
NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by tryin ...
. The
time complexities In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...
of most of the planted motif search algorithms depend exponentially on the alphabet size and ''l''. The PMS problem was first introduced by Keich and Pevzner. The problem of identifying meaningful patterns (e.g., motifs) from
biological data Biological data refers to a compound or information derived from living organisms and their products. A medicinal compound made from living organisms, such as a serum or a vaccine, could be characterized as biological data. Biological data is highly ...
has been studied extensively since they play a vital role in understanding gene function,
human disease A disease is a particular abnormal condition that negatively affects the structure or function of all or part of an organism, and that is not immediately due to any external injury. Diseases are often known to be medical conditions that ...
, and may serve as therapeutic drug targets.


Description

The search problem may be summarized as follows: ''Input are n strings (s1, s2, ... , sn) of length m each from an alphabet Σ and two integers l and d. Find all strings x such that , x, = l and every input string contains at least one variant of x at a
Hamming distance In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chang ...
of at most d. Each such x is referred to as an (l, d) motif.'' For example, if the input strings are GCGCGAT, CACGTGA, and CGGTGCC; ''l'' = 3 and ''d'' = 1, then GGT is a motif of interest. Note that the first input string has GAT as a
substring In formal language theory and computer science, a substring is a contiguous sequence of characters within a string. For instance, "''the best of''" is a substring of "''It was the best of times''". In contrast, "''Itwastimes''" is a subsequence ...
, the second input string has CGT as a substring, and the third input string has GGT as a substring. GAT is a variant of GGT that is within a Hamming distance of 1 from GGT, etc. Call the variants of a motif that occur in the input strings as instances of the motif. For example, GAT is an instance of the motif GGT that occurs in the first input string. Zero or more (''l'', ''d'') motifs are contained in any given set of input strings. Many of the known algorithms for PMS consider DNA strings for which Σ =. There exist
algorithms In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
that deal with protein strings as well. The PMS problem is also known as the (''l'', ''d'')-motif search (LDMS) problem.


Notation

The following
mathematical notation Mathematical notation consists of using symbols for representing operations, unspecified numbers, relations and any other mathematical objects, and assembling them into expressions and formulas. Mathematical notation is widely used in mathe ...
is often used to describe PMS algorithms. Assume that S = is the given set of input strings from an alphabet Σ. An ''l''-mer of any string is nothing but a substring of the string of length ''l''. Let ''dH(a, b)'' stand for the
Hamming distance In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chang ...
between any two ''l''-mers ''a'' and ''b''. Let ''a'' be an ''l''-mer and ''s'' be an input string. Then, let ''dH(a, s)'' stand for the minimum Hamming distance between ''a'' and any ''l''-mer ''b'' of ''s''. If ''a'' is any ''l''-mer and ''S'' is a set of input strings then let ''dH(a, S)'' stand for maxsєS''dH(a, s)''. Let ''u'' be any ''l''-mer. Then, the ''d''-neighborhood of ''u'', (denoted as ''Bd(u)''), is nothing but the set of all the ''l''-mers ''v'' such that ''dH(u, v)'' ≤ ''d''. In other words, ''Bd(u)=''. Refer to any such ''l''-mer ''v'' as a ''d''-neighbor of ''u''. ''Bd(x, y)'' is used to denote the common ''d''-neighborhood of ''x'' and ''y'', where ''x'' and ''y'' are two ''l''-mers. ''Bd(x, y)'' is nothing but the set of all ''l''-mers that are within a distance of ''d'' from both ''x'' and ''y''. Similarly, ''Bd(x, y, z)'', etc. can be defined.


Algorithms

The scientific literature describes numerous algorithms for solving the PMS problem. These algorithms can be classified into two major types. Those algorithms that may not return the optimal answer(s) are referred to as
approximation algorithms In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned solut ...
(or
heuristic A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediat ...
algorithms) and those that always return the optimal answer(s) are called exact algorithms.


Approximate

Examples of approximation (or heuristic) algorithms include Random Projection, PatternBranching, MULTIPROFILER, CONSENSUS, and ProfileBranching. These algorithms have been experimentally demonstrated to perform well.


Random projection

The algorithm is based on random projections. Let the motif ''M'' of interest be an ''l''-mer and ''C'' be the collection of all the ''l''-mers from all the ''n'' input strings. The algorithm projects these ''l''-mers along ''k'' randomly chosen positions (for some appropriate value of ''k''). The projection of each ''l''-mer may be thought of as an integer. The projected values (which are ''k''-mers) are grouped according to their integer values. In other words, hash all the ''l''-mers using the ''k''-mer of any ''l''-mer as its hash value. All the ''l''-mers that have the same hash value fall into the same hash bucket. Since the instances of any (''l'', ''d'') motif are similar to each other, many of these instances will fall into the same bucket. Note that the
Hamming distance In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chang ...
between any two instances of an (''l'', ''d'') motif is no more than 2''d''. The key idea of this algorithm is to examine those buckets that have a large number of ''l''-mers in them. For each such bucket, an expectation maximization (EM) algorithm is used to check if an (''l'', ''d'') motif can be found using the ''l''-mers in the bucket.


Pattern branching

This algorithm is a local searching algorithm. If ''u'' is any ''l''-mer, then there are \tbinom ld 3^d ''l''-mers that are ''d''-neighbors of ''u'', for DNA strings. This algorithm starts from each ''l''-mer ''u'' in the input, searches the neighbors of ''u'', scores them appropriately and outputs the best scoring neighbor.


Exact

Many exact algorithms are known for solving the PMS problem as well. Examples include the ones in (Martinez 1983), (Brazma, et al. 1998), (Galas, et al. 1985), (Sinha, et al. 2000), (Staden 1989), (Tompa 1999), (Helden, et al. 1998) (Rajasekaran, et al.), (Davila and Rajasekaran 2006), (Davila, Balla, and Rajasekaran 2006), Voting and RISOTTO.


WINNOWER and SP-STAR

The WINNOWER algorithm is a
heuristic algorithm In mathematical optimization and computer science, heuristic (from Greek εὑρίσκω "I find, discover") is a technique designed for solving a problem more quickly when classic methods are too slow for finding an approximate solution, or whe ...
and it works as follows. If ''A'' and ''B'' are two instances of the same motif in two different input strings, then the Hamming distance between ''A'' and ''B'' is at most 2''d''. It can be shown that the expected Hamming distance between ''A'' and ''B'' is 2d-\tfrac. WINNOWER constructs a collection ''C'' of all possible ''l''-mers in the input. A graph ''G(V,E)'' is constructed in which each ''l''-mer of ''C'' will be a node. Two nodes ''u'' and ''v'' in ''G'' are connected by an edge if and only if the Hamming distance between ''u'' and ''v'' is at most 2''d'' and they come from two different input strings. If ''M'' is an (''l'', ''d'') motif and if ''M1'', ''M2'', ..., and ''Mn'' are instances of ''M'' in the input strings, then, clearly, these instances will form a
clique A clique ( AusE, CanE, or ), in the social sciences, is a group of individuals who interact with one another and share similar interests. Interacting with cliques is part of normative social development regardless of gender, ethnicity, or popula ...
in ''G''. The WINNOWER algorithm has two phases. In the first phase, it identifies large cliques in ''G''. In the second phase each such clique is examined to see if a motif can be extracted from this clique. Since the
CLIQUE A clique ( AusE, CanE, or ), in the social sciences, is a group of individuals who interact with one another and share similar interests. Interacting with cliques is part of normative social development regardless of gender, ethnicity, or popula ...
problem is
intractable Intractable may refer to: * Intractable conflict, a form of complex, severe, and enduring conflict * Intractable pain, pain which cannot be controlled/cured by any known treatment * Intractable problem In theoretical computer science and mathema ...
, WINNOWER uses a heuristic to solve CLIQUE. It iteratively constructs cliques of larger and larger sizes. If ''N'' = ''mn'', then the run time of the algorithm is O(N^ ). This algorithm runs in a reasonable amount of time in practice especially for small values of ''d''. Another algorithm called SP-STAR, is faster than WINNOWER and uses less memory. WINNOWER algorithm treats all the edges of ''G'' equally without distinguishing between edges based on similarities. SP-STAR scores the ''l''-mers of ''C'' as well as the edges of ''G'' appropriately and hence eliminates more edges than WINNOWER per iteration. (Bailey and Elkan, 1994) employs expectation maximization algorithms while Gibbs sampling is used by (Lawrence et al., 1993). MULTIPROFILER MEME, are also known PMS algorithms.


PMS series

In the last decade a series of algorithms with PMS as a prefix has been developed in the lab o
Rajasekaran
Some of these algorithms are described below.


= PMS0

= PMSo works as follows. Let ''s1'', ''s2'', ..., ''sn'' be a given set of input strings each of length ''m''. Let ''C'' be the collection of ''l''-mers in ''s1''. Let C′ = ∪''u∈C''''B''''d''(''u''). For each element ''v'' of ''C′'' check if it is a valid (''l'', ''d'')-motif or not. Given an ''l''-mer ''v'', a check if it is a valid (''l'', ''d'')-motif or not can be made in O(''mnl'') time. Thus the run time of PMS0, assuming an alphabet of size 4, is O(m^2 nl\tbinom ld 3^d).


= PMS1

= This algorithm is based on
radix sorting In computer science, radix sort is a non- comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process ...
and has the following steps. # Generate the set of all ''l''-mers in each input string. Let ''Ci'' correspond to the ''l''-mers of ''si'', for 1≤''i''≤''n''. # For each ''l''-mer u in ''Ci'' (1 < ''i'' < ''n''), generate ''Bd''(''u''). Let ''Li'' be a collection of all of these neighbors (corresponding to all the ''l''-mers of ''si''). # Sort ''Li'' (using radix sort) and eliminate any duplicates. # Compute :\bigcap_^ L_i.. This can be done by merging the lists ''L1'', ''L2'', ..., ''Ln''. All the ''l''-mers in this intersection are valid (''l'', ''d'') motifs.


= PMS2

= Let the motif ''M'' of interest be of length ''l''. If ''M'' occurs in every input string then any
substring In formal language theory and computer science, a substring is a contiguous sequence of characters within a string. For instance, "''the best of''" is a substring of "''It was the best of times''". In contrast, "''Itwastimes''" is a subsequence ...
of ''M'' also occurs in every input string. Here occurrence means occurrence within a Hamming distance of ''d''. It follows that there are at least ''l''-''k''+1 strings each of length ''k'' (for ''k'' ≤ ''l'') such that each of these occurs in every input string. Let ''Q'' be a collection of ''k''-mers in ''M''. Note that, in every input string ''si'', there will be at least one position ''ij'' such that a ''k''-mer of ''Q'' occurs starting from ''ij''. Another ''k''-mer of ''Q'' occurs starting from ''ij'' +1 and so on, with the last ''k''-mer occurring at ''ij'' + l – ''k''. An ''l''-mer can be obtained by combining these ''k''-mers that occur starting from each such ''ij''. PMS2 works as follows. In the first phase find all the (''k'', ''d'') motifs present in all the input strings (for some appropriate value of ''k''<''l''). In the second phase, look for (''l''-''k''+1) of these (''k'', ''d'') motifs that occur starting from successive positions in each of the input strings. From every such collection of (''l''-''k''+1) (''k'', ''d'')-motifs, ''l''-mer can be generated (if possible). Each such ''l''-mer is a candidate (''l'', ''d'')-motif. For each candidate motif, check if it is an (''l'', ''d'')-motif or not in O(''mnl'') time. This ''l''-mer is returned as output if this is an (''l'', ''d'')-motif.


= PMS3

= This algorithm enables one to handle large values of ''d''. Let ''d′''=''d''/2. Let ''M'' be the motif to be found with , ''M'', =''l''=2''l′'' for some integer ''l′''. Let ''M1'' refer to the first half of ''M'' and ''M2'' be the next half. Let ''s''= ''a1a2...am'' be one of the input strings. ''M'' occurs in every input string. Let the occurrence of ''M'' (within a Hamming distance of ''d'') in ''s'' start at position ''i''. Let ''s′''=''aiai+1...ai+l'-1'' and ''s′′'' =''ai+l'...ai+l-1''. It is clear that either the Hamming distance between ''M1'' and ''s′'' is at most ''d′'' or the Hamming distance between ''M2'' and ''s′′'' is at most ''d′''. Either ''M''1 or ''M''2 occurs in every input string at a Hamming distance of at most ''d′''. As a result, in at least ''n′'' strings (where ''n′'' = ''n''/2) either ''M''1 or ''M''2 occurs with a Hamming distance of at most ''d''. The algorithm first obtains all the (''l′'', ''d′'')-motifs that occur in at least ''n''/2 of the input strings. It then uses these motifs and the above observations to identify all the (''l'', ''d'')-motifs present in the input strings.


= PMSPrune

= This algorithm introduces a
tree structure A tree structure, tree diagram, or tree model is a way of representing the hierarchical nature of a structure in a graphical form. It is named a "tree structure" because the classic representation resembles a tree, although the chart is genera ...
for the motif candidates and uses a
branch-and-bound algorithm Branch and bound (BB, B&B, or BnB) is an algorithm algorithmic paradigm, design paradigm for discrete optimization, discrete and combinatorial optimization problems, as well as mathematical optimization. A branch-and-bound algorithm consists of a ...
to reduce the search space. Let ''S'' = be a given set of input strings. PMSprune follows the same strategy as PMS0: For every ''l''-mer ''y'' in ''s1'', it generates the set of neighbors of ''y'' and, for each of them, checks whether this is a motif or not. Some key steps in the algorithm are: # It generates the ''d''-neighborhood of every ''l''-mer ''y'' in ''s1'' using a tree of height ''d''. The root of this tree will have ''y''. Every ''l''-mer that is at a distance of 1 from ''y'' will be a node in the tree that is at a distance of 1 from the root; every ''l''-mer that is at a distance of 2 from ''y'' will be a node in the tree that is at a distance of 2 from the root; and so on. When a node in this tree is visited, check if the corresponding ''l''-mer is an (''l'', ''d'')-motif. I.e., if the ''l''-mer is ''x'', check if ''dH(x, S)''≤''d''. If so, output this ''l''-mer. In any case move to the next node in the tree. This tree is explored in a depth first manner. # If each node in the tree is visited for each ''l''-mer ''y'' in ''s1'', then the run time of PMSPrune will be at least as much as that of PMS0. PMSPrune uses some pruning conditions to prune subtrees that cannot possibly have any motifs in them. # For an ''l''-mer ''x'', which corresponds to a node in a subtree of height ''h'', the algorithm uses the value of ''dH(x, S)'' and ''h'' to prune the descendants of ''x''. # PMSPrune calculates the value of ''dH(x, S)'' for the nodes (''x'') in the tree in an incremental way, taking into account the way in which the neighborhood is generated.


= PMS4

= PMS4 is a technique that can be used to speedup any algorithm for the PMS problem. In many of the above algorithms there are two phases. In the first phase we come up with a set of candidate motifs and in the second phase check, for each candidate motif, if it is a valid (''l'', ''d'')-motif. For each candidate motif it takes O(''mnl'') time to check if it is a valid motif or not. PMS4 employs a similar two phase strategy. These phases are explained below. Let A be any PMS algorithm. # Run the algorithm A on ''k'' input strings (where ''k'' < ''n''). An optimal value of ''k'' can be determined empirically. The ''k'' strings may be picked in a number of ways. For example, they could be the first ''k'' strings, random ''k'' strings, and so on. Let ''C'' be the collection of (''l'', ''d'')-motifs found in these ''k'' strings. Clearly, ''C'' is a superset of the (''l'', ''d'')-motifs present in the ''n'' given input strings. # for each ''l''-mer ''v'' in ''C'' do :::Check if ''v'' is a valid motif in O(''mnl'') time. If so, output ''v''.


= PMS5 and PMS6

= PMS5 is an extension of PMS0. If ''S'' = is a set of strings (not necessarily of the same length), let M_l^d (S) denote the (''l'', ''d'')-motifs present in ''S''. Let ''S′'' = . PMS5 computes the (''l'', ''d'')-motifs of ''S'' as \bigcup_ M_l^d (L,S^'). Here ''L'' refers to an ''l''-mer. One of the key steps in the algorithm is a
subroutine In computer programming, a function or subroutine is a sequence of program instructions that performs a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed. Functions ma ...
to compute the common ''d''-neighborhood of three ''l''-mers. Let ''x'', ''y'', ''z'' be any three ''l''-mers. To compute ''Bd(x, y, z)'', PMS5 represents ''Bd(x)'' as a tree ''Td(x)''. Each node in this tree represents an ''l''-mer in ''Bd(x)''. The root of ''Td(x)'' stands for the ''l''-mer x. ''Td(x)'' has a depth of d. Nodes of ''Td(x)'' are traversed in a
depth-first Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node (selecting some arbitrary node as the root node in the case of a graph) and explores as far as possible alo ...
manner. The node and the ''l''-mer it represents may be used interchangeably. While the tree is traversed, any node ''t'' will be output if ''t'' is in B_d (y)\bigcap B_d (z). When any node ''t'' is visited, check if there is a descendent ''t′'' of ''t'' such that ''t′'' is in B_d (y)\bigcap B_d (z). Prune the
subtree In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be con ...
rooted at ''t'' if there is no such descendent. In PMS5, the problem of checking if ''t'' has any descendent that is in B_d (y)\bigcap B_d (z) is formulated as an integer linear program (ILP) on ten variables. This ILP is solved in O(1) time. Solving the ILP instances is done as a preprocessing step and the results are stored in a lookup table. Algorithm PMS6 is an extension of PMS5 that improves the preprocessing step and also it uses efficient
hashing Hash, hashes, hash mark, or hashing may refer to: Substances * Hash (food), a coarse mixture of ingredients * Hash, a nickname for hashish, a cannabis product Hash mark * Hash mark (sports), a marking on hockey rinks and gridiron football fiel ...
techniques to store the lookup tables. As a result, it is typically faster than PMS5. Shibdas Bandyopadhyay, Sartaj Sahni, Sanguthevar Rajasekaran, "PMS6: A fast algorithm for motif discovery," iccabs, pp. 1–6, 2012 IEEE 2nd International Conference on Computational Advances in Bio and medical Sciences, 2012


= qPMSPrune and qPMS7

= Given a set ''S''= of strings, and integers ''l'', ''d'', and ''q'', an (''l, d, q'')-motif is defined to be a string ''M'' of length ''l'' that occurs in at least ''q'' of the ''n'' input strings within a Hamming distance of ''d''. The qPMS (''Quorum Planted Motif Search'') problem is to find all the (''l, d, q'')-motifs present in the input strings. The qPMS problem captures the nature of motifs more precisely than the PMS problem does because, in practice, some motifs may not have motif instances in all of the input strings. Any algorithm for solving the qPMS problem (when ''q'' ≠ ''n'') is typically named with a prefix of q. qPMSPrune is one of the first algorithms to address this version of the PMS problem. qPMSPrune exploits the following fact: If ''M'' is any (''l, d, q'')-motif of the input strings ''s1, s2, ..., sn'', then there exists an ''i'' (with 1 ≤ ''i'' ≤ ''n'' – ''q'' + 1) and an ''l''-mer such that ''M'' is in ''Bd(x)'' and ''M'' is an (''l, d, q''-1)-motif of the input strings excluding ''si''. The algorithm processes every ''si'', 1≤ ''i'' ≤ ''n''. While processing ''si'', it considers every ''l''-mer ''x'' of ''si''. When considering ''x'', it constructs ''Bd(x)'' and identifies elements of ''Bd(x)'' that are (''l, d, q''-1) motifs (with respect to input strings other than ''si''). ''Bd(x)'' is represented as a tree with ''x'' as the root. This tree will be traversed in a depth first manner. The algorithm does not traverse the entire tree. Some of the subtrees are pruned using effective pruning conditions. In particular, a subtree is pruned if it can be inferred that none of the nodes in this subtree carries a motif of interest. Algorithm qPMS7 is an extension of qPMSPrune. Specifically, it is based on the following observation: If ''M'' is any (''l, d, q'')-motif of the input strings ''s1, s2, ..., sn'', then there exist 1 ≤ ''i'' ≠ ''j'' ≤ ''n'' and ''l''-mer x\in s_i and ''l''-mer y\in s_j such that ''M'' is in B_d (x)\bigcap B_d (y) and ''M'' is an (''l, d, q''-2)-motif of the input strings excluding ''si'' and ''sj''. The algorithm considers every possible pair (''i, j''), 1≤ ''i, j'' ≤ ''n'' and ''i'' ≠ ''j''. For any pair (''i'', ''j''), every possible pair of ''l''-mers (''x, y'') is considered (where ''x'' is from ''si'' and ''y'' is from ''sj''). While considering any ''x'' and ''y'', the algorithm identifies all the elements of B_d (x)\bigcap B_d (y) that are (''l, d, q''-2) motifs (with respect to input strings other than ''si'' and ''sj''). An acyclic graph is used to represent and explore B_d (x)\bigcap B_d (y). Call this graph ''Gd(x, y)''. ''Gd(x, y)'' is traversed in a depth first manner. Like in qPMSPrune, qPMS7 also employs some pruning conditions to prune subgraphs of ''Gd(x, y)''.


RISOTTO

RISOTTO employs a
suffix tree In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow parti ...
to identify the (''l, d'')-motifs. It is somewhat similar to PMS0. For every ''l''-mer in ''s1'', it generates the ''d''-neighborhood and for every ''l''-mer in this neighborhood it walks through a suffix tree to check if this ''l''-mer is an (''l, d'')-motif. Voting is similar to PMS1. Instead of using radix sorting, it uses hashing to compute ''Li''s and their
intersections In mathematics, the intersection of two or more objects is another object consisting of everything that is contained in all of the objects simultaneously. For example, in Euclidean geometry, when two lines in a plane are not parallel, their ...
.


Relative performance

PMS algorithms are typically tested on random benchmark data generated as follows: Twenty strings each of length 600 are generated randomly from the alphabet of interest. The motif ''M'' is also generated randomly and planted in each of the input strings within a Hamming distance of ''d''. The motif instances are also generated randomly. Certain instances of the (''l, d'')-motif problem have been identified to be ''challenging''. For a given value of ''l'', the instance (''l, d'') is called challenging if ''d'' is the smallest integer for which the expected number of (''l, d'')-motifs that occur by random chance (in addition to the planted one) is one or more. For example, the following instances are challenging: (9, 2), (11, 3), (13, 4), (15, 5), (17, 6), (19, 7), etc. The performance of PMS algorithms is customarily shown only for challenging instances. Following is a table of time comparison of different PMS algorithms on the challenging instances of DNA sequences for the special case. This table is taken from the paper qPMS7. In this table several algorithms have been compared: qPMSPrune, qPMSPruneI, Pampa, Voting, RISOTTO, PMS5, PMS6, qPMS7. In the following table, the alphabet Σ=, ''n''=20, ''m''=600, and ''q''=''n''=20.


References


External links

* * {{cite web , url = http://pms.engr.uconn.edu , title = Panoptic Motif Search , last1 = Rajasekaran , first1 = S. , last2 = Dinh , first2 = H. , publisher = University of Connecticut , url-status = dead , archive-url = https://web.archive.org/web/20110802205417/http://pms.engr.uconn.edu/ , archive-date = 2011-08-02 Bioinformatics Computational biology