Substring
   HOME

TheInfoList



OR:

In
formal language theory In logic, mathematics, computer science, and linguistics, a formal language is a set of string (computer science), strings whose symbols are taken from a set called "#Definition, alphabet". The alphabet of a formal language consists of symbol ...
and
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, a substring is a contiguous sequence of characters within a
string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
. For instance, "''the best of''" is a substring of "''It was the best of times''". In contrast, "''Itwastimes''" is a subsequence of "''It was the best of times''", but not a substring. Prefixes and suffixes are special cases of substrings. A prefix of a string S is a substring of S that occurs at the beginning of S; likewise, a suffix of a string S is a substring that occurs at the end of S. The substrings of the string "''apple''" would be: "''a''", "''ap''", "''app''", "''appl''", "''apple''", "''p''", "''pp''", "''ppl''", "''pple''", "''pl''", "''ple''", "''l''", "''le''" "''e''", "" (note the
empty string In formal language theory, the empty string, or empty word, is the unique String (computer science), string of length zero. Formal theory Formally, a string is a finite, ordered sequence of character (symbol), characters such as letters, digits ...
at the end).


Substring

A string u is a substring (or factor) of a string t if there exists two strings p and s such that t = pus. In particular, the empty string is a substring of every string. Example: The string u=ana is equal to substrings (and subsequences) of t=banana at two different offsets: banana , , , , , ana, , , , , ana The first occurrence is obtained with p=b and s=na, while the second occurrence is obtained with p=ban and s being the empty string. A substring of a string is a
prefix A prefix is an affix which is placed before the stem of a word. Particularly in the study of languages, a prefix is also called a preformative, because it alters the form of the word to which it is affixed. Prefixes, like other affixes, can b ...
of a
suffix In linguistics, a suffix is an affix which is placed after the stem of a word. Common examples are case endings, which indicate the grammatical case of nouns and adjectives, and verb endings, which form the conjugation of verbs. Suffixes can ca ...
of the string, and equivalently a suffix of a prefix; for example, nan is a prefix of nana, which is in turn a suffix of banana. If u is a substring of t, it is also a
subsequence In mathematics, a subsequence of a given sequence is a sequence that can be derived from the given sequence by deleting some or no elements without changing the order of the remaining elements. For example, the sequence \langle A,B,D \rangle is a ...
, which is a more general concept. The occurrences of a given pattern in a given string can be found with a string searching algorithm. Finding the longest string which is equal to a substring of two or more strings is known as the
longest common substring problem Long may refer to: Measurement * Long, characteristic of something of great duration * Long, characteristic of something of great length * Longitude (abbreviation: long.), a geographic coordinate * Longa (music), note value in early music mens ...
. In the mathematical literature, substrings are also called subwords (in America) or factors (in Europe).


Prefix

A string p is a prefix of a string t if there exists a string s such that t = ps. A ''proper prefix'' of a string is not equal to the string itself; some sources in addition restrict a proper prefix to be non-empty. A prefix can be seen as a special case of a substring. Example: The string ban is equal to a prefix (and substring and subsequence) of the string banana: banana , , , ban The square subset symbol is sometimes used to indicate a prefix, so that p \sqsubseteq t denotes that p is a prefix of t. This defines a
binary relation In mathematics, a binary relation associates some elements of one Set (mathematics), set called the ''domain'' with some elements of another set called the ''codomain''. Precisely, a binary relation over sets X and Y is a set of ordered pairs ...
on strings, called the prefix relation, which is a particular kind of prefix order.


Suffix

A string s is a suffix of a string t if there exists a string p such that t = ps. A ''proper suffix'' of a string is not equal to the string itself. A more restricted interpretation is that it is also not empty. A suffix can be seen as a special case of a substring. Example: The string nana is equal to a suffix (and substring and subsequence) of the string banana: banana , , , , nana A suffix tree for a string is a
trie In computer science, a trie (, ), also known as a digital tree or prefix tree, is a specialized search tree data structure used to store and retrieve strings from a dictionary or set. Unlike a binary search tree, nodes in a trie do not store t ...
data structure In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
that represents all of its suffixes. Suffix trees have large numbers of applications in
string algorithms In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). ...
. The suffix array is a simplified version of this data structure that lists the start positions of the suffixes in alphabetically sorted order; it has many of the same applications.


Border

A border is suffix and prefix of the same string, e.g. "bab" is a border of "babab" (and also of "baboon eating a kebab").


Superstring

A superstring of a finite set P of strings is a single string that contains every string in P as a substring. For example, \text is a superstring of P = \, and \text is a shorter one. Concatenating all members of P, in arbitrary order, always obtains a trivial superstring of P. Finding superstrings whose length is as small as possible is a more interesting problem. A string that contains every possible permutation of a specified character set is called a superpermutation.


See also

* Brace notation * Substring index * Suffix automaton


References

{{Reflist, refs= {{cite book , last = Gusfield , first = Dan , orig-year = 1997 , year = 1999 , title = Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology , publisher = Cambridge University Press , location = US , isbn = 0-521-58519-8 {{cite book , last = Kelley , first = Dean , year = 1995 , title = Automata and Formal Languages: An Introduction , publisher = Prentice-Hall International , location = London , isbn = 0-13-497777-7 {{cite book , last = Lothaire , first = M. , year = 1997 , title = Combinatorics on words , publisher = Cambridge University Press , location = Cambridge , isbn = 0-521-59924-5 String (computer science) Formal languages