HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
, the longest repeated substring problem is the problem of finding the longest
substring In formal language theory and computer science, a substring is a contiguous sequence of characters within a string. For instance, "''the best of''" is a substring of "''It was the best of times''". In contrast, "''Itwastimes''" is a subsequence ...
of a
string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
that occurs at least twice. This problem can be solved in linear time and space \Theta(n) by building a
suffix tree In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow parti ...
for the string (with a special end-of-string symbol like '$' appended), and finding the deepest internal node in the tree with more than one child. Depth is measured by the number of characters traversed from the root. The string spelled by the edges from the root to such a node is a longest repeated substring. The problem of finding the longest substring with at least k occurrences can be solved by first preprocessing the tree to count the number of leaf descendants for each internal node, and then finding the deepest node with at least k leaf descendants. To avoid overlapping repeats, you can check that the list of suffix lengths has no consecutive elements with less than prefix-length difference. In the figure with the string "ATCGATCGA$", the longest substring that repeats at least twice is "ATCGA".


External links

*
C implementation of Longest Repeated Substring using Suffix Tree

Online Demo: Longest Repeated Substring
Problems on strings Formal languages Combinatorics {{algorithm-stub