HOME

TheInfoList



OR:

Combinatorics on words is a fairly new field of
mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
, branching from
combinatorics Combinatorics is an area of mathematics primarily concerned with counting, both as a means and an end in obtaining results, and certain properties of finite structures. It is closely related to many other areas of mathematics and has many a ...
, which focuses on the study of
words A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no conse ...
and
formal language In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of sym ...
s. The subject looks at letters or
symbol A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different conc ...
s, and the
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is called ...
s they form. Combinatorics on words affects various areas of mathematical study, including
algebra Algebra () is one of the broad areas of mathematics. Roughly speaking, algebra is the study of mathematical symbols and the rules for manipulating these symbols in formulas; it is a unifying thread of almost all of mathematics. Elementary ...
and
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
. There have been a wide range of contributions to the field. Some of the first work was on
square-free word In combinatorics, a squarefree word is a word (a sequence of symbols) that does not contain any squares. A square is a word of the form , where is not empty. Thus, a squarefree word can also be defined as a word that avoids the pattern . Finite ...
s by Axel Thue in the early 1900s. He and colleagues observed patterns within words and tried to explain them. As time went on, combinatorics on words became useful in the study of
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
s and coding. It led to developments in
abstract algebra In mathematics, more specifically algebra, abstract algebra or modern algebra is the study of algebraic structures. Algebraic structures include groups, rings, fields, modules, vector spaces, lattices, and algebras over a field. The ter ...
and answering open questions.


Definition

Combinatorics is an area of
discrete mathematics Discrete mathematics is the study of mathematical structures that can be considered "discrete" (in a way analogous to discrete variables, having a bijection with the set of natural numbers) rather than "continuous" (analogously to continuou ...
. Discrete mathematics is the study of countable structures. These objects have a definite beginning and end. The study of enumerable objects is the opposite of disciplines such as
analysis Analysis ( : analyses) is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle (3 ...
, where
calculus Calculus, originally called infinitesimal calculus or "the calculus of infinitesimals", is the mathematics, mathematical study of continuous change, in the same way that geometry is the study of shape, and algebra is the study of generalizati ...
and infinite structures are studied. Combinatorics studies how to count these objects using various representations. Combinatorics on words is a recent development in this field that focuses on the study of words and formal languages. A formal language is any set of symbols and combinations of symbols that people use to communicate information. Some terminology relevant to the study of words should first be explained. First and foremost, a word is basically a sequence of symbols, or letters, in a
finite set In mathematics, particularly set theory, a finite set is a set that has a finite number of elements. Informally, a finite set is a set which one could in principle count and finish counting. For example, :\ is a finite set with five elements. T ...
. One of these sets is known by the general public as the
alphabet An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a syllab ...
. For example, the word "encyclopedia" is a sequence of symbols in the
English alphabet The alphabet for Modern English is a Latin-script alphabet consisting of 26 letters, each having an upper- and lower-case form. The word ''alphabet'' is a compound of the first two letters of the Greek alphabet, ''alpha'' and '' beta''. ...
, a finite set of twenty-six letters. Since a word can be described as a sequence, other basic mathematical descriptions can be applied. The alphabet is a
set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...
, so as one would expect, the
empty set In mathematics, the empty set is the unique set having no elements; its size or cardinality (count of elements in a set) is zero. Some axiomatic set theories ensure that the empty set exists by including an axiom of empty set, while in othe ...
is a
subset In mathematics, set ''A'' is a subset of a set ''B'' if all elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they are unequal, then ''A'' is a proper subset of ...
. In other words, there exists a unique word of length zero. The length of the word is defined by the number of symbols that make up the sequence, and is denoted by , ''w'', . Again looking at the example "encyclopedia", , ''w'', = 12, since encyclopedia has twelve letters. The idea of factoring of large numbers can be applied to words, where a factor of a word is a block of consecutive symbols. Thus, "cyclop" is a factor of "encyclopedia". In addition to examining sequences in themselves, another area to consider of combinatorics on words is how they can be represented visually. In mathematics various structures are used to encode data. A common structure used in combinatorics is the
tree structure A tree structure, tree diagram, or tree model is a way of representing the hierarchical nature of a structure in a graphical form. It is named a "tree structure" because the classic representation resembles a tree, although the chart is genera ...
. A tree structure is a graph where the vertices are connected by one line, called a path or
edge Edge or EDGE may refer to: Technology Computing * Edge computing, a network load-balancing system * Edge device, an entry point to a computer network * Adobe Edge, a graphical development application * Microsoft Edge, a web browser developed ...
. Trees may not contain cycles, and may or may not be complete. It is possible to encode a word, since a word is constructed by symbols, and encode the data by using a tree. This gives a visual representation of the object.


Major contributions

The first books on combinatorics on words that summarize the origins of the subject were written by a group of mathematicians that collectively went by the name of
M. Lothaire M. Lothaire is the pseudonym of a group of mathematicians, many of whom were students of Marcel-Paul Schützenberger. The name is used as the author of several of their joint books about combinatorics on words. The group is named for Lothair I.. ...
. Their first book was published in 1983, when combinatorics on words became more widespread.


Patterns


Patterns within words

A main contributor to the development of combinatorics on words was Axel Thue (1863–1922); he researched repetition. Thue's main contribution was the proof of the existence of infinite
square-free word In combinatorics, a squarefree word is a word (a sequence of symbols) that does not contain any squares. A square is a word of the form , where is not empty. Thus, a squarefree word can also be defined as a word that avoids the pattern . Finite ...
s. Square-free words do not have adjacent repeated factors. To clarify, "summer" is not square-free since m is repeated consecutively, while "encyclopedia" is square-free. Thue proves his conjecture on the existence of infinite square-free words by using
substitution Substitution may refer to: Arts and media *Chord substitution, in music, swapping one chord for a related one within a chord progression *Substitution (poetry), a variation in poetic scansion * "Substitution" (song), a 2009 song by Silversun Pic ...
s. A substitution is a way to take a symbol and replace it with a word. He uses this technique to describe his other contribution, the
Thue–Morse sequence In mathematics, the Thue–Morse sequence, or Prouhet–Thue–Morse sequence, is the binary sequence (an infinite sequence of 0s and 1s) obtained by starting with 0 and successively appending the Boolean complement of the sequence obtained thus ...
, or Thue–Morse word. Thue wrote two papers on square-free words, the second of which was on the Thue–Morse word. Marston Morse is included in the name because he discovered the same result as Thue did, yet they worked independently. Thue also proved the existence of an overlap-free word. An overlap-free word is when, for two symbols x and y, the pattern does not exist within the word. He continues in his second paper to prove a relationship between infinite overlap-free words and square-free words. He takes overlap-free words that are created using two different letters, and demonstrates how they can be transformed into square-free words of three letters using substitution. As was previously described, words are studied by examining the sequences made by the symbols. Patterns are found, and they are able to be described mathematically. Patterns can be either avoidable patterns, or unavoidable. A significant contributor to the work of
unavoidable pattern In mathematics and theoretical computer science, a pattern is an unavoidable pattern if it is unavoidable on any finite alphabet. Definitions Pattern Like a word, a pattern (also called ''term'') is a sequence of symbols over some alphabet. ...
s, or regularities, was Frank Ramsey in 1930. His important theorem states that for integers k, m≥2, there exists a least positive integer such that despite how a complete graph is colored with two colors, there will always exist a solid color subgraph of each color. Other contributors to the study of unavoidable patterns include van der Waerden. His theorem states that if the positive integers are partitioned into k classes, then there exists a class c such that c contains an arithmetic progression of some unknown length. An
arithmetic progression An arithmetic progression or arithmetic sequence () is a sequence of numbers such that the difference between the consecutive terms is constant. For instance, the sequence 5, 7, 9, 11, 13, 15, . . . is an arithmetic progression with a common differ ...
is a sequence of numbers in which the difference between adjacent numbers remains constant. When examining unavoidable patterns
sesquipower In mathematics, a sesquipower or Zimin word is a string over an alphabet with identical prefix and suffix. Sesquipowers are unavoidable patterns, in the sense that all sufficiently long strings contain one. Formal definition Formally, let ''A'' b ...
s are also studied. For some patterns x,y,z, a sesquipower is of the form x, , , .... This is another pattern such as square-free, or unavoidable patterns. Coudrain and Schützenberger mainly studied these sesquipowers for
group theory In abstract algebra, group theory studies the algebraic structures known as groups. The concept of a group is central to abstract algebra: other well-known algebraic structures, such as rings, fields, and vector spaces, can all be seen ...
applications. In addition, Zimin proved that sesquipowers are all unavoidable. Whether the entire pattern shows up, or only some piece of the sesquipower shows up repetitively, it is not possible to avoid it.


Patterns within alphabets

Necklaces are constructed from words of circular sequences. They are most frequently used in
music Music is generally defined as the art of arranging sound to create some combination of form, harmony, melody, rhythm or otherwise expressive content. Exact definitions of music vary considerably around the world, though it is an aspe ...
and
astronomy Astronomy () is a natural science that studies celestial objects and phenomena. It uses mathematics, physics, and chemistry in order to explain their origin and evolution. Objects of interest include planets, moons, stars, nebulae, g ...
. Flye Sainte-Marie in 1894 proved there are 22''n''−1−''n'' binary
de Bruijn De Bruijn is a Dutch surname meaning "the brown". Notable people with the surname include: * (1887–1968), Dutch politician * Brian de Bruijn (b. 1954), Dutch-Canadian ice hockey player * Chantal de Bruijn (b. 1976), Dutch field hockey defender ...
necklaces of length 2n. A de Bruijn necklace contains factors made of words of length n over a certain number of letters. The words appear only once in the necklace. In 1874, Baudot developed the code that would eventually take the place of
Morse code Morse code is a method used in telecommunication to encode text characters as standardized sequences of two different signal durations, called ''dots'' and ''dashes'', or ''dits'' and ''dahs''. Morse code is named after Samuel Morse, one ...
by applying the theory of binary de Bruijn necklaces. The problem continued from Sainte-Marie to
Martin Martin may refer to: Places * Martin City (disambiguation) * Martin County (disambiguation) * Martin Township (disambiguation) Antarctica * Martin Peninsula, Marie Byrd Land * Port Martin, Adelie Land * Point Martin, South Orkney Islands Austr ...
in 1934, who began looking at algorithms to make words of the de Bruijn structure. It was then worked on by
Posthumus Posthumus is a surname mostly stemming from the Dutch province of Friesland. Among variants are '' Posthuma'' and ''Postmus''. The surname may have originated in the same way Romans called boys and girls born after the death of their father ''Post ...
in 1943.


Language hierarchy

Possibly the most applied result in combinatorics on words is the Chomsky hierarchy, developed by
Noam Chomsky Avram Noam Chomsky (born December 7, 1928) is an American public intellectual: a linguist, philosopher, cognitive scientist, historian, social critic, and political activist. Sometimes called "the father of modern linguistics", Chomsky i ...
. He studied formal language in the 1950s. His way of looking at language simplified the subject. He disregards the actual meaning of the word, does not consider certain factors such as frequency and context, and applies patterns of short terms to all length terms. The basic idea of Chomsky's work is to divide language into four levels, or the language
hierarchy A hierarchy (from Greek: , from , 'president of sacred rites') is an arrangement of items (objects, names, values, categories, etc.) that are represented as being "above", "below", or "at the same level as" one another. Hierarchy is an important ...
. The four levels are: regular, context-free, context-sensitive, and
computably enumerable In computability theory, a set ''S'' of natural numbers is called computably enumerable (c.e.), recursively enumerable (r.e.), semidecidable, partially decidable, listable, provable or Turing-recognizable if: *There is an algorithm such that the ...
or unrestricted. Regular is the least complex while computably enumerable is the most complex. While his work grew out of combinatorics on words, it drastically affected other disciplines, especially
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
.


Word types


Sturmian words

Sturmian word In mathematics, a Sturmian word (Sturmian sequence or billiard sequence), named after Jacques Charles François Sturm, is a certain kind of infinitely long sequence of characters. Such a sequence can be generated by considering a game of Englis ...
s, created by François Sturm, have roots in combinatorics on words. There exist several equivalent definitions of Sturmian words. For example, an infinite word is Sturmian if and only if it has n+1 distinct factors of length n, for every non-negative integer n.


Lyndon word

A Lyndon word is a word over a given alphabet that is written in its simplest and most ordered form out of its respective conjugacy class. Lyndon words are important because for any given Lyndon word x, there exists Lyndon words y and z, with ytheorem by Chen, Fox, and Lyndon, that states any word has a unique factorization of Lyndon words, where the factorization words are
non-increasing In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is called ...
. Due to this property, Lyndon words are used to study
algebra Algebra () is one of the broad areas of mathematics. Roughly speaking, algebra is the study of mathematical symbols and the rules for manipulating these symbols in formulas; it is a unifying thread of almost all of mathematics. Elementary ...
, specifically
group theory In abstract algebra, group theory studies the algebraic structures known as groups. The concept of a group is central to abstract algebra: other well-known algebraic structures, such as rings, fields, and vector spaces, can all be seen ...
. They form the basis for the idea of
commutators In mathematics, the commutator gives an indication of the extent to which a certain binary operation fails to be commutative. There are different definitions used in group theory and ring theory. Group theory The commutator of two elements, ...
.


Visual representation

Cobham contributed work relating Eugène Prouhet's work with
finite automata A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
. A mathematical graph is made of edges and
node In general, a node is a localized swelling (a " knot") or a point of intersection (a vertex). Node may refer to: In mathematics * Vertex (graph theory), a vertex in a mathematical graph * Vertex (geometry), a point where two or more curves, line ...
s. With finite automata, the edges are labeled with a letter in an alphabet. To use the graph, one starts at a node and travels along the edges to reach a final node. The path taken along the graph forms the word. It is a finite graph because there are a countable number of nodes and edges, and only one path connects two distinct nodes.
Gauss code Gauss notation (also known as a Gauss code or Gauss word) is a notation for mathematical knots. It is created by enumerating and classifying the crossings of an embedding of the knot in a plane. It is named for the mathematician Carl Friedrich Gau ...
s, created by
Carl Friedrich Gauss Johann Carl Friedrich Gauss (; german: Gauß ; la, Carolus Fridericus Gauss; 30 April 177723 February 1855) was a German mathematician and physicist who made significant contributions to many fields in mathematics and science. Sometimes refer ...
in 1838, are developed from graphs. Specifically, a
closed curve In mathematics, a curve (also called a curved line in older texts) is an object similar to a line, but that does not have to be straight. Intuitively, a curve may be thought of as the trace left by a moving point. This is the definition that ...
on a plane is needed. If the curve only crosses over itself a finite number of times, then one labels the intersections with a letter from the alphabet used. Traveling along the curve, the word is determined by recording each letter as an intersection is passed. Gauss noticed that the distance between when the same symbol shows up in a word is an even integer.


Group theory

Walther Franz Anton von Dyck Walther Franz Anton von Dyck (6 December 1856 – 5 November 1934), born Dyck () and later ennobled, was a German mathematician. He is credited with being the first to define a mathematical group, in the modern sense in . He laid the foundation ...
began the work of combinatorics on words in group theory by his published work in 1882 and 1883. He began by using words as group elements.
Lagrange Joseph-Louis Lagrange (born Giuseppe Luigi Lagrangiapermutation groups. One aspect of combinatorics on words studied in group theory is reduced words. A group is constructed with words on some alphabet including generators and
inverse element In mathematics, the concept of an inverse element generalises the concepts of opposite () and reciprocal () of numbers. Given an operation denoted here , and an identity element denoted , if , one says that is a left inverse of , and that is ...
s, excluding factors that appear of the form aā or āa, for some a in the alphabet.
Reduced words In group theory, a word is any written product of group elements and their inverses. For example, if ''x'', ''y'' and ''z'' are elements of a group ''G'', then ''xy'', ''z''−1''xzz'' and ''y''−1''zxx''−1''yz''−1 are words in the set . Two ...
are formed when the factors aā, āa are used to cancel out elements until a unique word is reached.
Nielsen transformation In mathematics, especially in the area of abstract algebra known as combinatorial group theory, Nielsen transformations, named after Jakob Nielsen, are certain automorphisms of a free group which are a non-commutative analogue of row reduction a ...
s were also developed. For a set of elements of a
free group In mathematics, the free group ''F'S'' over a given set ''S'' consists of all words that can be built from members of ''S'', considering two words to be different unless their equality follows from the group axioms (e.g. ''st'' = ''suu''−1' ...
, a Nielsen transformation is achieved by three transformations; replacing an element with its inverse, replacing an element with the product of itself and another element, and eliminating any element equal to 1. By applying these transformations Nielsen reduced sets are formed. A reduced set means no element can be multiplied by other elements to cancel out completely. There are also connections with Nielsen transformations with Sturmian words.


Considered problems

One problem considered in the study of combinatorics on words in group theory is the following: for two elements x,y of a
semigroup In mathematics, a semigroup is an algebraic structure consisting of a set together with an associative internal binary operation on it. The binary operation of a semigroup is most often denoted multiplicatively: ''x''·''y'', or simply ''xy'', ...
, does x=y
modulo In computing, the modulo operation returns the remainder or signed remainder of a division, after one number is divided by another (called the '' modulus'' of the operation). Given two positive numbers and , modulo (often abbreviated as ) is ...
the defining relations of x and y. Post and Markov studied this problem and determined it undecidable. Undecidable means the theory cannot be proved. The Burnside question was proved using the existence of an infinite
cube-free word In combinatorics, a squarefree word is a word (a sequence of symbols) that does not contain any squares. A square is a word of the form , where is not empty. Thus, a squarefree word can also be defined as a word that avoids the pattern . Finit ...
. This question asks if a group is finite if the group has a definite number of generators and meets the criteria xn=1, for x in the group. Many word problems are undecidable based on the Post correspondence problem. Any two
homomorphisms In algebra, a homomorphism is a structure-preserving map between two algebraic structures of the same type (such as two groups, two rings, or two vector spaces). The word ''homomorphism'' comes from the Ancient Greek language: () meaning "same ...
g,h with a common domain and a common codomain form an instance of the Post correspondence problem, which asks whether there exists a word w in the domain such that g(w)=h(w). Post proved that this problem is undecidable; consequently, any word problem that can be reduced to this basic problem is likewise undecidable.


Other applications

Combinatorics on words have applications on
equations In mathematics, an equation is a formula that expresses the equality of two expressions, by connecting them with the equals sign . The word ''equation'' and its cognates in other languages may have subtly different meanings; for example, in ...
. Makanin proved that it is possible to find a solution for a finite system of equations, when the equations are constructed from words.


See also

* Fibonacci word * Kolakoski sequence * Levi's lemma *
Partial word In computer science and the study of combinatorics on words, a partial word is a string that may contain a number of "do not know" or "do not care" symbols i.e. placeholders in the string where the symbol value is not known or not specified. Mor ...
* Shift space * Word metric *
Word problem (computability) In computability theory and computational complexity theory, a decision problem is a computational problem that can be posed as a yes–no question of the input values. An example of a decision problem is deciding by means of an algorithm wheth ...
* Word problem (mathematics) *
Word problem for groups In mathematics, especially in the area of abstract algebra known as combinatorial group theory, the word problem for a finitely generated group ''G'' is the algorithmic problem of deciding whether two words in the generators represent the same el ...
* Young–Fibonacci lattice


References


Further reading


The origins of combinatorics on words
Jean Berstel,
Dominique Perrin Dominique Pierre Perrin (b. 1946) is a French mathematician and theoretical computer scientist known for his contributions to coding theory and to combinatorics on words. He is a professor of the University of Marne-la-Vallée and currently serve ...
, European Journal of Combinatorics 28 (2007) 996–1022
Combinatorics on words – a tutorial
Jean Berstel and Juhani Karhumäki. Bull. Eur. Assoc. Theor. Comput. Sci. EATCS, 79:178–228, 2003.
Combinatorics on Words: A New Challenging Topic
Juhani Karhumäki. * * * *"Infinite words: automata, semigroups, logic and games", Dominique Perrin, Jean Éric Pin, Academic Press, 2004, . * *"
Algorithmic Combinatorics on Partial Words ''Algorithmic Combinatorics on Partial Words'' is a book in the area of combinatorics on words, and more specifically on partial words. It was written by Francine Blanchet-Sadri, and published in 2008 by Chapman & Hall/CRC in their Discrete Mathema ...
", Francine Blanchet-Sadri, CRC Press, 2008, . * *"Combinatorics of Compositions and Words",
Silvia Heubach Silvia Heubach is a German-American mathematician specializing in enumerative combinatorics, combinatorial game theory, and bioinformatics. She is a professor of mathematics at California State University, Los Angeles. Education and career Heubac ...
,
Toufik Mansour Toufik Mansour is an Israeli mathematician working in algebraic combinatorics. He is a member of the Druze community and is the first Israeli Druze to become a professional mathematician. Mansour obtained his Ph.D. in mathematics from the Uni ...
, CRC Press, 2009, . *"Word equations and related topics: 1st international workshop, IWWERT '90, Tübingen, Germany, October 1–3, 1990 : proceedings", Editor: Klaus Ulrich Schulz, Springer, 1992, *" Jewels of stringology: text algorithms", Maxime Crochemore, Wojciech Rytter, World Scientific, 2003, * * *"Patterns in Permutations and Words", Sergey Kitaev, Springer, 2011, *"Distribution Modulo One and Diophantine Approximation", Yann Bugeaud, Cambridge University Press, 2012,


External links

{{Commonscat
Jean Berstel's page