HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
, the lexicographically minimal string rotation or lexicographically least circular substring is the problem of finding the rotation of a string possessing the lowest
lexicographical order In mathematics, the lexicographic or lexicographical order (also known as lexical order, or dictionary order) is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of ...
of all such rotations. For example, the lexicographically minimal rotation of "bbaaccaadd" would be "aaccaaddbb". It is possible for a string to have multiple lexicographically minimal rotations, but for most applications this does not matter as the rotations must be equivalent. Finding the lexicographically minimal rotation is useful as a way of normalizing strings. If the strings represent potentially isomorphic structures such as
graphs Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties * Graph (topology), a topological space resembling a graph in the sense of discr ...
, normalizing in this way allows for simple equality checking. A common implementation trick when dealing with circular strings is to concatenate the string to itself instead of having to perform
modular arithmetic In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" when reaching a certain value, called the modulus. The modern approach to modular arithmetic was developed by Carl Friedrich Gauss in his bo ...
on the string indices.


Algorithms


The Naive Algorithm

The naive algorithm for finding the lexicographically minimal rotation of a string is to iterate through successive rotations while keeping track of the most lexicographically minimal rotation encountered. If the string is of length , this algorithm runs in time in the worst case.


Booth's Algorithm

An efficient algorithm was proposed by Booth (1980). The algorithm uses a modified preprocessing function from the Knuth-Morris-Pratt string search algorithm. The failure function for the string is computed as normal, but the string is rotated during the computation so some indices must be computed more than once as they wrap around. Once all indices of the failure function have been successfully computed without the string rotating again, the minimal lexicographical rotation is known to be found and its starting index is returned. The correctness of the algorithm is somewhat difficult to understand, but it is easy to implement. def least_rotation(S: str) -> int: n = len(S) f = 1* (2 * n) k = 0 for j in range(1, 2 * n): i = f - k - 1 while i != -1 and S % n!= S k + i + 1) % n if S % n< S k + i + 1) % n k = j - i - 1 i = f if i

-1 and S % n!= S k + i + 1) % n if S % n< S k + i + 1) % n k = j f - k= -1 else: f - k= i + 1 return k
Of interest is that removing all lines of code which modify the value of results in the original Knuth-Morris-Pratt preprocessing function, as (representing the rotation) will remain zero. Booth's algorithm runs in time, where is the length of the string. The algorithm performs at most comparisons in the worst case, and requires auxiliary memory of length to hold the failure function table.


Shiloach's Fast Canonization Algorithm

Shiloach (1981) proposed an algorithm improving on Booth's result in terms of performance. It was observed that if there are ''q'' equivalent lexicographically minimal rotations of a string of length ''n'', then the string must consist of ''q'' equal substrings of length ''d=n/q''. The algorithm requires only ''n + d/2'' comparisons and constant space in the worst case. The algorithm is divided into two phases. The first phase is a quick sieve which rules out indices that are obviously not starting locations for the lexicographically minimal rotation. The second phase then finds the lexicographically minimal rotation start index from the indices which remain.


Duval's Lyndon Factorization Algorithm

Duval (1983) proposed an efficient algorithm involving the factorization of the string into its component
Lyndon word In mathematics, in the areas of combinatorics and computer science, a Lyndon word is a nonempty string that is strictly smaller in lexicographic order than all of its rotations. Lyndon words are named after mathematician Roger Lyndon, who invest ...
s, which runs in linear time with a constant memory requirement.


Variants

Shiloach (1979){{cite journal , author = Yossi Shiloach , title = A fast equivalence-checking algorithm for circular lists , journal = Information Processing Letters , publisher = Elsevier , volume = 8 , number = 5 , pages = 236–238 , year = 1979 , doi = 10.1016/0020-0190(79)90114-5 , issn = 0020-0190 proposed an algorithm to efficiently compare two circular strings for equality without a normalization requirement. An additional application which arises from the algorithm is the fast generation of certain chemical structures without repetitions.


See also

*
Lyndon word In mathematics, in the areas of combinatorics and computer science, a Lyndon word is a nonempty string that is strictly smaller in lexicographic order than all of its rotations. Lyndon words are named after mathematician Roger Lyndon, who invest ...
* Knuth-Morris-Pratt algorithm


References

Problems on strings Lexicography Articles with example code