HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, the complexity function of a ''word'' or ''
string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
'' (a finite or infinite sequence of symbols from some
alphabet An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
) is the function that counts the number of distinct ''factors'' (substrings of consecutive symbols) of that string. More generally, the complexity function of a
formal language In logic, mathematics, computer science, and linguistics, a formal language is a set of strings whose symbols are taken from a set called "alphabet". The alphabet of a formal language consists of symbols that concatenate into strings (also c ...
(a set of finite strings) counts the number of distinct words of given length.


Complexity function of a word

Let ''u'' be a (possibly infinite) sequence of symbols from an alphabet. Define the function ''p''''u''(''n'') of a positive integer ''n'' to be the number of different factors (consecutive substrings) of length ''n'' from the string ''u''.Lothaire (2011) p.7Pytheas Fogg (2002) p.3Berstel et al (2009) p.82Allouche & Shallit (2003) p.298 For a string ''u'' of length at least ''n'' over an alphabet of size ''k'' we clearly have : 1 \le p_u(n) \le k^n \ , the bounds being achieved by the constant word and a disjunctive word,Bugeaud (2012) p.91 for example, the Champernowne word respectively.Cassaigne & Nicolas (2010) p.165 For infinite words ''u'', we have ''p''''u''(''n'') bounded if ''u'' is ultimately periodic (a finite, possibly empty, sequence followed by a finite cycle). Conversely, if ''p''''u''(''n'') ≤ ''n'' for some ''n'', then ''u'' is ultimately periodic.Allouche & Shallit (2003) p.302 An aperiodic sequence is one which is not ultimately periodic. An aperiodic sequence has strictly increasing complexity function (this is the Morse–Hedlund theorem),Cassaigne & Nicolas (2010) p.166 so ''p''(''n'') is at least ''n''+1.Lothaire (2011) p.22 A set ''S'' of finite binary words is ''balanced'' if for each ''n'' the subset ''S''''n'' of words of length ''n'' has the property that the
Hamming weight The Hamming weight of a string (computer science), string is the number of symbols that are different from the zero-symbol of the alphabet used. It is thus equivalent to the Hamming distance from the all-zero string of the same length. For the mo ...
of the words in ''S''''n'' takes at most two distinct values. A balanced sequence is one for which the set of factors is balanced.Allouche & Shallit (2003) p.313 A balanced sequence has complexity function at most ''n''+1.Lothaire (2011) p.48 A Sturmian word over a binary alphabet is one with complexity function ''n'' + 1.Pytheas Fogg (2002) p.6 A sequence is Sturmian
if and only if In logic and related fields such as mathematics and philosophy, "if and only if" (often shortened as "iff") is paraphrased by the biconditional, a logical connective between statements. The biconditional is true in two cases, where either bo ...
it is balanced and aperiodic.Lothaire (2011) p.46Allouche & Shallit (2003) p.318 An example is the
Fibonacci word A Fibonacci word is a specific sequence of Binary numeral system, binary digits (or symbols from any two-letter Alphabet (formal languages), alphabet). The Fibonacci word is formed by repeated concatenation in the same way that the Fibonacci num ...
. More generally, a Sturmian word over an alphabet of size ''k'' is one with complexity ''n''+''k''−1. An Arnoux-Rauzy word over a ternary alphabet has complexity 2''n'' + 1: an example is the Tribonacci word.Pytheas Fogg (2002) p.368 For recurrent words, those in which each factor appears infinitely often, the complexity function almost characterises the set of factors: if ''s'' is a recurrent word with the same complexity function as ''t'' are then ''s'' has the same set of factors as ''t'' or δ''t'' where δ denotes the letter doubling morphism ''a'' → ''aa''.Berstel et al (2009) p.84


Complexity function of a language

Let ''L'' be a language over an alphabet and define the function ''p''''L''(''n'') of a positive integer ''n'' to be the number of different words of length ''n'' in ''L''Berthé & Rigo (2010) p.166 The complexity function of a word is thus the complexity function of the language consisting of the factors of that word. The complexity function of a language is less constrained than that of a word. For example, it may be bounded but not eventually constant: the complexity function of the
regular language In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...
a(bb)^*a takes values 3 and 4 on odd and even ''n''≥2 respectively. There is an analogue of the Morse–Hedlund theorem: if the complexity of ''L'' satisfies ''p''''L''(''n'') ≤ ''n'' for some ''n'', then ''p''''L'' is bounded and there is a finite language ''F'' such that :L \subseteq \ \ . A polynomial or
sparse language In computational complexity theory, a sparse language is a formal language (a set of strings) such that the complexity function, counting the number of strings of length ''n'' in the language, is bounded by a polynomial function of ''n''. They are ...
is one for which the complexity function ''p''(''n'') is bounded by a fixed power of ''n''. A
regular language In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...
which is not polynomial is ''exponential'': there are infinitely many ''n'' for which ''p''(''n'') is greater than ''k''''n'' for some fixed ''k'' > 1.Berthé & Rigo (2010) p.136


Related concepts

The ''
topological entropy In mathematics, the topological entropy of a topological dynamical system is a nonnegative extended real number that is a measure of the complexity of the system. Topological entropy was first introduced in 1965 by Adler, Konheim and McAndrew. Th ...
'' of an infinite sequence ''u'' is defined by :H_(u) = \lim_ \frac \ . The limit exists as the logarithm of the complexity function is subadditive.Pytheas Fogg (2002) p.4Allouche & Shallit (2003) p.303 Every
real number In mathematics, a real number is a number that can be used to measure a continuous one- dimensional quantity such as a duration or temperature. Here, ''continuous'' means that pairs of values can have arbitrarily small differences. Every re ...
between 0 and 1 occurs as the topological entropy of some sequence is applicable,Cassaigne & Nicolas (2010) p.169 which may be taken to be uniformly recurrentBerthé & Rigo (2010) p.391 or even uniquely ergodic.Berthé & Rigo (2010) p.169 For ''x'' a real number and ''b'' an
integer An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
≥ 2 then the complexity function of ''x'' in base ''b'' is the complexity function ''p''(''x'',''b'',''n'') of the sequence of digits of ''x'' written in base ''b''. If ''x'' is an
irrational number In mathematics, the irrational numbers are all the real numbers that are not rational numbers. That is, irrational numbers cannot be expressed as the ratio of two integers. When the ratio of lengths of two line segments is an irrational number, ...
then ''p''(''x'',''b'',''n'') ≥ ''n''+1; if ''x'' is
rational Rationality is the quality of being guided by or based on reason. In this regard, a person acts rationally if they have a good reason for what they do, or a belief is rational if it is based on strong evidence. This quality can apply to an ...
then ''p''(''x'',''b'',''n'') ≤ ''C'' for some constant ''C'' depending on ''x'' and ''b''.Bugeaud (2012) p.91 It is conjectured that for algebraic irrational ''x'' the complexity is ''b''''n'' (which would follow if all such numbers were normal) but all that is known in this case is that ''p'' grows faster than any linear function of ''n''.Berthé & Rigo (2010) p.414 The abelian complexity function ''p''ab(''n'') similarly counts the number of occurrences of distinct factors of given length ''n'', where now we identify factors that differ only by a permutation of the positions. Clearly ''p''ab(''n'') ≤ ''p''(''n''). The abelian complexity of a Sturmian sequence satisfies ''p''ab(''n'') = 2.


References

* * * * * * * {{cite book , last=Pytheas Fogg , first=N. , editor1=Berthé, Valérie, editor1-link=Valérie Berthé, editor2=Ferenczi, Sébastien, editor3=Mauduit, Christian, editor4=Siegel, A. , title=Substitutions in dynamics, arithmetics and combinatorics , series=Lecture Notes in Mathematics , volume=1794 , location=Berlin , publisher=
Springer-Verlag Springer Science+Business Media, commonly known as Springer, is a German multinational publishing company of books, e-books and peer-reviewed journals in science, humanities, technical and medical (STM) publishing. Originally founded in 1842 in ...
, year=2002 , isbn=3-540-44141-7 , zbl=1014.11015 Theoretical computer science