
In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
, the longest repeated substring problem is the problem of finding the longest
substring
In formal language theory and computer science, a substring is a contiguous sequence of characters within a string. For instance, "''the best of''" is a substring of "''It was the best of times''". In contrast, "''Itwastimes''" is a subsequence ...
of a
string
String or strings may refer to:
*String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects
Arts, entertainment, and media Films
* ''Strings'' (1991 film), a Canadian anim ...
that occurs at least twice.
This problem can be solved in linear time and space
by building a
suffix tree
In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow parti ...
for the string (with a special end-of-string symbol like '$' appended), and finding the deepest internal node in the tree with more than one child. Depth is measured by the number of characters traversed from the root. The string spelled by the edges from the root to such a node is a longest repeated substring. The problem of finding the longest substring with at least
occurrences can be solved by first preprocessing the tree to count the number of leaf descendants for each internal node, and then finding the deepest node with at least
leaf descendants. To avoid overlapping repeats, you can check that the list of suffix lengths has no consecutive elements with less than prefix-length difference.
In the figure with the string "ATCGATCGA$", the longest substring that repeats at least twice is "ATCGA".
External links
*
C implementation of Longest Repeated Substring using Suffix TreeOnline Demo: Longest Repeated Substring
Problems on strings
Formal languages
Combinatorics
{{algorithm-stub