HOME

TheInfoList



OR:

In
theoretical computer science computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory. It is difficult to circumscribe the ...
and mathematical logic a string rewriting system (SRS), historically called a semi- Thue system, is a
rewriting In mathematics, computer science, and logic, rewriting covers a wide range of methods of replacing subterms of a formula with other terms. Such methods may be achieved by rewriting systems (also known as rewrite systems, rewrite engines, or redu ...
system over strings from a (usually
finite Finite is the opposite of infinite. It may refer to: * Finite number (disambiguation) * Finite set, a set whose cardinality (number of elements) is some natural number * Finite verb Traditionally, a finite verb (from la, fīnītus, past particip ...
)
alphabet An alphabet is a standardized set of basic written graphemes (called letter (alphabet), letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character ...
. Given a
binary relation In mathematics, a binary relation associates elements of one set, called the ''domain'', with elements of another set, called the ''codomain''. A binary relation over sets and is a new set of ordered pairs consisting of elements in and i ...
R between fixed strings over the alphabet, called rewrite rules, denoted by s\rightarrow t, an SRS extends the rewriting relation to all strings in which the left- and right-hand side of the rules appear as
substring In formal language theory and computer science, a substring is a contiguous sequence of characters within a string. For instance, "''the best of''" is a substring of "''It was the best of times''". In contrast, "''Itwastimes''" is a subsequence ...
s, that is usv\rightarrow utv, where s, t, u, and v are strings. The notion of a semi-Thue system essentially coincides with the presentation of a monoid. Thus they constitute a natural framework for solving the word problem for monoids and groups. An SRS can be defined directly as an
abstract rewriting system In mathematical logic and theoretical computer science, an abstract rewriting system (also (abstract) reduction system or abstract rewrite system; abbreviated ARS) is a formalism that captures the quintessential notion and properties of rewriting s ...
. It can also be seen as a restricted kind of a
term rewriting In mathematics, computer science, and logic, rewriting covers a wide range of methods of replacing subterms of a formula with other terms. Such methods may be achieved by rewriting systems (also known as rewrite systems, rewrite engines, or reduc ...
system. As a formalism, string rewriting systems are
Turing complete Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Turing was highly influential in the development of theoretical com ...
. The semi-Thue name comes from the Norwegian mathematician
Axel Thue Axel Thue (; 19 February 1863 – 7 March 1922) was a Norwegian mathematician, known for his original work in diophantine approximation and combinatorics. Work Thue published his first important paper in 1909. He stated in 1914 the so-called w ...
, who introduced systematic treatment of string rewriting systems in a 1914 paper. Thue introduced this notion hoping to solve the word problem for finitely presented semigroups. Only in 1947 was the problem shown to be undecidable— this result was obtained independently by Emil Post and A. A. Markov Jr.


Definition

A string rewriting system or semi-Thue system is a tuple (\Sigma, R) where * is an alphabet, usually assumed finite. The elements of the set \Sigma^* (* is the
Kleene star In mathematical logic and computer science, the Kleene star (or Kleene operator or Kleene closure) is a unary operation, either on sets of strings or on sets of symbols or characters. In mathematics, it is more commonly known as the free monoid ...
here) are finite (possibly empty) strings on , sometimes called ''words'' in
formal languages In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of sym ...
; we will simply call them strings here. * is a
binary relation In mathematics, a binary relation associates elements of one set, called the ''domain'', with elements of another set, called the ''codomain''. A binary relation over sets and is a new set of ordered pairs consisting of elements in and i ...
on strings from , i.e., R \subseteq \Sigma^* \times \Sigma^*. Each element (u,v) \in R is called a ''(rewriting) rule'' and is usually written u \rightarrow v. If the relation is
symmetric Symmetry (from grc, συμμετρία "agreement in dimensions, due proportion, arrangement") in everyday language refers to a sense of harmonious and beautiful proportion and balance. In mathematics, "symmetry" has a more precise definit ...
, then the system is called a Thue system. The rewriting rules in can be naturally extended to other strings in \Sigma^* by allowing substrings to be rewritten according to . More formally, the one-step rewriting relation relation \xrightarrow /math> induced by on \Sigma^* for any strings s, t \in \Sigma^*: : s \xrightarrow t if and only if there exist x, y, u, v \in \Sigma^* such that s = xuy, t = xvy, and u \rightarrow v. Since \xrightarrow /math> is a relation on \Sigma^*, the pair (\Sigma^*, \xrightarrow fits the definition of an
abstract rewriting system In mathematical logic and theoretical computer science, an abstract rewriting system (also (abstract) reduction system or abstract rewrite system; abbreviated ARS) is a formalism that captures the quintessential notion and properties of rewriting s ...
. Obviously is a subset of \xrightarrow /math>. Some authors use a different notation for the arrow in \xrightarrow /math> (e.g. \xrightarrow /math>) in order to distinguish it from itself (\rightarrow) because they later want to be able to drop the subscript and still avoid confusion between and the one-step rewrite induced by . Clearly in a semi-Thue system we can form a (finite or infinite) sequence of strings produced by starting with an initial string s_0 \in \Sigma^* and repeatedly rewriting it by making one substring-replacement at a time: :s_0 \ \xrightarrow \ s_1 \ \xrightarrow \ s_2 \ \xrightarrow \ \ldots A zero-or-more-steps rewriting like this is captured by the
reflexive transitive closure In mathematics, a subset of a given set is closed under an operation of the larger set if performing that operation on members of the subset always produces a member of that subset. For example, the natural numbers are closed under addition, but ...
of \xrightarrow /math>, denoted by \xrightarrow /math> (see abstract rewriting system#Basic notions). This is called the rewriting relation or reduction relation on \Sigma^* induced by .


Thue congruence

In general, the set \Sigma^* of strings on an alphabet forms a
free monoid In abstract algebra, the free monoid on a set is the monoid whose elements are all the finite sequences (or strings) of zero or more elements from that set, with string concatenation as the monoid operation and with the unique sequence of zero elem ...
together with the
binary operation In mathematics, a binary operation or dyadic operation is a rule for combining two elements (called operands) to produce another element. More formally, a binary operation is an operation of arity two. More specifically, an internal binary op ...
of string concatenation (denoted as \cdot and written multiplicatively by dropping the symbol). In a SRS, the reduction relation \xrightarrow /math> is compatible with the monoid operation, meaning that x\xrightarrow y implies uxv\xrightarrow uyv for all strings x, y, u, v \in \Sigma^*. Since \xrightarrow /math> is by definition a
preorder In mathematics, especially in order theory, a preorder or quasiorder is a binary relation that is reflexive and transitive. Preorders are more general than equivalence relations and (non-strict) partial orders, both of which are special ca ...
, \left(\Sigma^*, \cdot, \xrightarrow right) forms a monoidal preorder. Similarly, the reflexive transitive symmetric closure of \xrightarrow /math>, denoted \overset (see abstract rewriting system#Basic notions), is a congruence, meaning it is an equivalence relation (by definition) and it is also compatible with string concatenation. The relation \overset is called the Thue congruence generated by . In a Thue system, i.e. if is symmetric, the rewrite relation \xrightarrow /math> coincides with the Thue congruence \overset.


Factor monoid and monoid presentations

Since \overset is a congruence, we can define the factor monoid \mathcal_R = \Sigma^*/\overset of the
free monoid In abstract algebra, the free monoid on a set is the monoid whose elements are all the finite sequences (or strings) of zero or more elements from that set, with string concatenation as the monoid operation and with the unique sequence of zero elem ...
\Sigma^* by the Thue congruence in the usual manner. If a monoid \mathcal is isomorphic with \mathcal_R, then the semi-Thue system (\Sigma, R) is called a monoid presentation of \mathcal. We immediately get some very useful connections with other areas of algebra. For example, the alphabet with the rules , where ε is the empty string, is a presentation of the
free group In mathematics, the free group ''F'S'' over a given set ''S'' consists of all words that can be built from members of ''S'', considering two words to be different unless their equality follows from the group axioms (e.g. ''st'' = ''suu''−1 ...
on one generator. If instead the rules are just , then we obtain a presentation of the
bicyclic monoid In mathematics, the bicyclic semigroup is an algebraic object important for the structure theory of semigroups. Although it is in fact a monoid, it is usually referred to as simply a semigroup. It is perhaps most easily understood as the syntactic ...
. The importance of semi-Thue systems as presentation of monoids is made stronger by the following: Theorem: Every monoid has a presentation of the form (\Sigma, R), thus it may be always be presented by a semi-Thue system, possibly over an infinite alphabet. In this context, the set \Sigma is called the set of generators of \mathcal, and R is called the set of defining relations \mathcal. We can immediately classify monoids based on their presentation. \mathcal is called * finitely generated if \Sigma is finite. * finitely presented if both \Sigma and R are finite.


Undecidability of the word problem

Post proved the word problem (for semigroups) to be undecidable in general, essentially by reducing the
halting problem In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running, or continue to run forever. Alan Turing proved in 1936 that a g ...
for Turing machines to an instance of the word problem. Concretely, Post devised an encoding as a finite string of the state of a Turing machine plus tape, such that the actions of this machine can be carried out by a string rewrite system acting on this string encoding. The alphabet of the encoding has one set of letters S_0, S_1, \dotsc, S_m for symbols on the tape (where S_0 means blank), another set of letters q_1, \dotsc, q_r for states of the Turing machine, and finally three letters q_, q_, h that have special roles in the encoding. q_ and q_ are intuitively extra internal states of the Turing machine which it transitions to when halting, whereas h marks the end of the non-blank part of the tape; a machine reaching an h should behave the same as if there was a blank there, and the h was in the next cell. The strings that are valid encodings of Turing machine states start with an h, followed by zero or more symbol letters, followed by exactly one internal state letter q_i (which encodes the state of the machine), followed by one or more symbol letters, followed by an ending h. The symbol letters are straight off the contents of the tape, and the internal state letter marks the position of the head; the symbol after the internal state letter is that in the cell currently under the head of the Turing machine. A transition where the machine upon being in state q_i and seeing the symbol S_k writes back symbol S_l, moves right, and transitions to state q_j is implemented by the rewrite : q_i S_k \to S_l q_j whereas that transition instead moving to the left is implemented by the rewrite : S_p q_i S_k \to q_j S_p S_l with one instance for each symbol S_p in that cell to the left. In the case that we reach the end of the visited portion of the tape, we use instead : h q_i S_k \to h q_j S_0 S_l , lengthening the string by one letter. Because all rewrites involve one internal state letter q_i, the valid encodings only contain one such letter, and each rewrite produces exactly one such letter, the rewrite process exactly follows the run of the Turing machine encoded. This proves that string rewrite systems are Turing complete. The reason for having two halted symbols q_ and q_ is that we want all halting Turing machines to terminate at the same ''total'' state, not just a particular ''internal'' state. This requires clearing the tape after halting, so q_ eats the symbol on it left until reaching the h, where it transitions into q_ which instead eats the symbol on its right. (In this phase the string rewrite system no longer simulates a Turing machine, since that cannot remove cells from the tape.) After all symbols are gone, we have reached the terminal string h q_ h . A decision procedure for the word problem would then also yield a procedure for deciding whether the given Turing machine terminates when started in a particular total state t , by testing whether t and h q_ h belong to the same congruence class with respect to this string rewrite system. Technically, we have the following: Lemma. Let M be a deterministic Turing machine and R be the string rewrite system implementing M, as described above. Then M will halt when started from the total state encoded as t if and only if t \mathrel h q_ h (that is to say, if and only if t and h q_ h are Thue congruent for R). That t \mathrel h q_ h if M halts when started from t is immediate from the construction of R (simply running M until it halts constructs a proof of t \mathrel h q_ h ), but \overset also allows the Turing machine M to take steps backwards. Here it becomes relevant that M is deterministic, because then the forward steps are all unique; in a \overset walk from t to h q_ h the last backward step must be followed by its counterpart as a forward step, so these two cancel, and by induction all backward steps can be eliminated from such a walk. Hence if M does ''not'' halt when started from t, i.e., if we do ''not'' have t \mathrel h q_ h , then we also do ''not'' have t \mathrel h q_ h . Therefore deciding \overset tells us the answer to the halting problem for M. An apparent limitation of this argument is that in order to produce a semigroup \Sigma^*\big/\overset with undecidable word problem, one must first have a concrete example of a Turing machine M for which the halting problem is undecidable, but the various Turing machines figuring in the proof of the undecidability of the general halting problem all have as component a hypothetical Turing machine solving the halting problem, so none of those machines can actually exist; all that proves is that there is ''some'' Turing machine for which the decision problem is undecidable. However, that there is some Turing machine with undecidable halting problem means that the halting problem for a ''universal'' Turing machine is undecidable (since that can simulate any Turing machine), and concrete examples of universal Turing machines have been constructed.


Connections with other notions

A semi-Thue system is also a term-rewriting system—one that has monadic words (functions) ending in the same variable as the left- and right-hand side terms, e.g. a term rule f_2(f_1(x)) \rightarrow g(x) is equivalent to the string rule f_1f_2 \rightarrow g. A semi-Thue system is also a special type of
Post canonical system A Post canonical system, also known as a Post production system, as created by Emil Post, is a string-manipulation system that starts with finitely-many strings and repeatedly transforms them by applying a finite set j of specified rules of a cert ...
, but every Post canonical system can also be reduced to an SRS. Both formalisms are
Turing complete Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Turing was highly influential in the development of theoretical com ...
, and thus equivalent to Noam Chomsky's
unrestricted grammar In automata theory, the class of unrestricted grammars (also called semi-Thue, type-0 or phrase structure grammars) is the most general class of grammars in the Chomsky hierarchy. No restrictions are made on the productions of an unrestricted gramma ...
s, which are sometimes called ''semi-Thue grammars''. A
formal grammar In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
only differs from a semi-Thue system by the separation of the alphabet into
terminal Terminal may refer to: Computing Hardware * Terminal (electronics), a device for joining electrical circuits together * Terminal (telecommunication), a device communicating over a line * Computer terminal, a set of primary input and output devi ...
s and non-terminals, and the fixation of a starting symbol amongst non-terminals. A minority of authors actually define a semi-Thue system as a triple (\Sigma, A, R), where A\subseteq\Sigma^* is called the ''set of axioms''. Under this "generative" definition of semi-Thue system, an unrestricted grammar is just a semi-Thue system with a single axiom in which one partitions the alphabet into terminals and non-terminals, and makes the axiom a nonterminal. The simple artifice of partitioning the alphabet into terminals and non-terminals is a powerful one; it allows the definition of the
Chomsky hierarchy In formal language theory, computer science and linguistics, the Chomsky hierarchy (also referred to as the Chomsky–Schützenberger hierarchy) is a containment hierarchy of classes of formal grammars. This hierarchy of grammars was described ...
based on what combination of terminals and non-terminals the rules contain. This was a crucial development in the theory of
formal languages In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of sym ...
. In quantum computing, the notion of a quantum Thue system can be developed. Since quantum computation is intrinsically reversible, the rewriting rules over the alphabet \Sigma are required to be bidirectional (i.e. the underlying system is a Thue system, not a semi-Thue system). On a subset of alphabet characters Q\subseteq \Sigma one can attach a Hilbert space \mathbb C^d, and a rewriting rule taking a substring to another one can carry out a unitary operation on the tensor product of the Hilbert space attached to the strings; this implies that they preserve the number of characters from the set Q. Similar to the classical case one can show that a quantum Thue system is a universal computational model for quantum computation, in the sense that the executed quantum operations correspond to uniform circuit classes (such as those in
BQP In computational complexity theory, bounded-error quantum polynomial time (BQP) is the class of decision problems solvable by a quantum computer in polynomial time, with an error probability of at most 1/3 for all instances.Michael Nielsen and Isaa ...
when e.g. guaranteeing termination of the string rewriting rules within polynomially many steps in the input size), or equivalently a
Quantum Turing machine A quantum Turing machine (QTM) or universal quantum computer is an abstract machine used to model the effects of a quantum computer. It provides a simple model that captures all of the power of quantum computation—that is, any quantum algori ...
.


History and importance

Semi-Thue systems were developed as part of a program to add additional constructs to logic, so as to create systems such as propositional logic, that would allow general mathematical theorems to be expressed in a formal language, and then proven and verified in an automatic, mechanical fashion. The hope was that the act of theorem proving could then be reduced to a set of defined manipulations on a set of strings. It was subsequently realized that semi-Thue systems are isomorphic to
unrestricted grammar In automata theory, the class of unrestricted grammars (also called semi-Thue, type-0 or phrase structure grammars) is the most general class of grammars in the Chomsky hierarchy. No restrictions are made on the productions of an unrestricted gramma ...
s, which in turn are known to be isomorphic to Turing machines. This method of research succeeded and now computers can be used to verify the proofs of mathematic and logical theorems. At the suggestion of
Alonzo Church Alonzo Church (June 14, 1903 – August 11, 1995) was an American mathematician, computer scientist, logician, philosopher, professor and editor who made major contributions to mathematical logic and the foundations of theoretical computer scien ...
, Emil Post in a paper published in 1947, first proved "a certain Problem of Thue" to be unsolvable, what Martin Davis states as "...the first unsolvability proof for a problem from classical mathematics -- in this case the word problem for semigroups." Davis also asserts that the proof was offered independently by A. A. Markov. A. A. Markov (1947) Doklady Akademii Nauk SSSR (N.S.) 55: 583–586


See also

*
L-system An L-system or Lindenmayer system is a parallel rewriting system and a type of formal grammar. An L-system consists of an alphabet of symbols that can be used to make strings, a collection of production rules that expand each symbol into som ...
*
MU puzzle The MU puzzle is a puzzle stated by Douglas Hofstadter and found in '' Gödel, Escher, Bach'' involving a simple formal system called "MIU". Hofstadter's motivation is to contrast reasoning within a formal system (ie., deriving theorems) against rea ...


Notes


References


Monographs

* Ronald V. Book and Friedrich Otto, ''String-rewriting Systems'', Springer, 1993, . * Matthias Jantzen, ''Confluent string rewriting'', Birkhäuser, 1988, .


Textbooks

* Martin Davis, Ron Sigal, Elaine J. Weyuker, ''Computability, complexity, and languages: fundamentals of theoretical computer science'', 2nd ed., Academic Press, 1994, , chapter 7 * Elaine Rich, ''Automata, computability and complexity: theory and applications'', Prentice Hall, 2007, , chapter 23.5.


Surveys

* Samson Abramsky, Dov M. Gabbay, Thomas S. E. Maibaum (ed.), ''Handbook of Logic in Computer Science: Semantic modelling'', Oxford University Press, 1995, . * Grzegorz Rozenberg, Arto Salomaa (ed.), ''Handbook of Formal Languages: Word, language, grammar'', Springer, 1997, .


Landmark papers

* {{Formal languages and grammars, state=collapsed Formal languages Theory of computation Rewriting systems ja:文字列書き換え系