In the theory of
formal languages
In logic, mathematics, computer science, and linguistics, a formal language is a set of string (computer science), strings whose symbols are taken from a set called "#Definition, alphabet".
The alphabet of a formal language consists of symbol ...
of
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
,
mathematics
Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
, and
linguistics
Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...
, a Dyck word is a balanced
string of brackets. The set of Dyck words forms a Dyck language. The simplest, Dyck-1, uses just two matching brackets, e.g. ( and ).
Dyck words and language are named after the mathematician
Walther von Dyck. They have applications in the
parsing
Parsing, syntax analysis, or syntactic analysis is a process of analyzing a String (computer science), string of Symbol (formal), symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal gramm ...
of expressions that must have a correctly nested sequence of brackets, such as arithmetic or algebraic expressions.
Formal definition
Let
be the alphabet consisting of the symbols
and Let
denote its
Kleene closure.
The Dyck language is defined as:
:
Context-free grammar
It may be helpful to define the Dyck language via a
context-free grammar in some situations.
The Dyck language is generated by the context-free grammar with a single non-terminal , and the production:
:
That is, ''S'' is either the
empty string
In formal language theory, the empty string, or empty word, is the unique String (computer science), string of length zero.
Formal theory
Formally, a string is a finite, ordered sequence of character (symbol), characters such as letters, digits ...
() or is "
, an element of the Dyck language, the matching ", and an element of the Dyck language.
An alternative context-free grammar for the Dyck language is given by the production:
:
That is, ''S'' is
zero or more occurrences of the combination of "
, an element of the Dyck language, and a matching ", where multiple elements of the Dyck language on the right side of the production are free to differ from each other.
Alternative definition
In yet other contexts it may instead be helpful to define the Dyck language by splitting
into equivalence classes, as follows.
For any element
of length
, we define
partial function
In mathematics, a partial function from a set to a set is a function from a subset of (possibly the whole itself) to . The subset , that is, the '' domain'' of viewed as a function, is called the domain of definition or natural domain ...
s
and
by
:
is
with "
" inserted into the
th position
:
is
with "
" deleted from the
th position
with the understanding that
is undefined for
and
is undefined if
. We define an
equivalence relation on
as follows: for elements
we have
if and only if there exists a sequence of zero or more applications of the
and
functions starting with
and ending with
. That the sequence of zero operations is allowed accounts for the
reflexivity of
.
Symmetry
Symmetry () in everyday life refers to a sense of harmonious and beautiful proportion and balance. In mathematics, the term has a more precise definition and is usually used to refer to an object that is Invariant (mathematics), invariant und ...
follows from the observation that any finite sequence of applications of
to a string can be undone with a finite sequence of applications of
.
Transitivity is clear from the definition.
The equivalence relation partitions the language
into equivalence classes. If we take
to denote the empty string, then the language corresponding to the equivalence class
is called the Dyck language.
Generalizations
Typed Dyck language
There exist variants of the Dyck language with multiple delimiters, e.g., Dyck-2 on the alphabet "(", ")", "
, and ". The words of such a language are the ones which are well-parenthesized for all delimiters, i.e., one can read the word from left to right, push every opening delimiter on the stack, and whenever we reach a closing delimiter then we must be able to pop the matching opening delimiter from the top of the stack. (The counting algorithm above does not generalise).
For example, the following is a valid sentence in Dyck-3 (with
matching delimiters colored the same):
:
Finite depth
A Dyck language sentence can be pictured as a descent and ascent through the levels of nested brackets. As one reads along a Dyck sentence, each opening bracket increases the nesting depth by 1, and each closing bracket decreases by 1. The depth of a sentence is the maximal depth reached within the sentence.
For example, we can annotate the following sentence with the depth at each step:
0 ( 1 2 [ 3 2 2 ">3_.html" ;"title="2 [ 3 ">2 [ 3 2 2 1 ( 2 ) 1 1 ) 0 [ 1 ">3_">2_[_3_<_a>2__2_.html" ;"title="3_.html" ;"title="2 [ 3 ">2 [ 3 2 2 ">3_.html" ;"title="2 [ 3 ">2 [ 3 2 2 1 ( 2 ) 1 1 ) 0 [ 1 0
and the entire sentence has depth 3.
We define Dyck-(k, m) as the language with k types of brackets and maximal depth m. This has applications in the formal theory of
recurrent neural networks.
Properties
* The Dyck language is closed under the operation of concatenation">Recurrent neural network">recurrent neural networks.
Properties
* The Dyck language is closed under the operation of concatenation.
* By treating
as an algebraic monoid under concatenation we see that the monoid structure transfers onto the quotient monoid, quotient
, resulting in the syntactic monoid of the Dyck language. The class
will be denoted
.
* The syntactic monoid of the Dyck language is not
commutative
In mathematics, a binary operation is commutative if changing the order of the operands does not change the result. It is a fundamental property of many binary operations, and many mathematical proofs depend on it. Perhaps most familiar as a pr ...
: if
then
uv = \operatorname([]) = 1 \ne \operatorname(][) = vu.
* With the notation above,
uv = 1 but neither
u nor
v are invertible in
\Sigma^ / R.
* The syntactic monoid of the Dyck language is isomorphic to the
bicyclic semigroup by virtue of the properties of
\operatorname( and \operatorname("> and \operatorname( described above.
* By the
Chomsky–Schützenberger representation theorem, any
context-free language is a homomorphic image of the intersection of some
regular language
In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...
with a Dyck language on one or more kinds of bracket pairs.
* The Dyck language with two distinct types of brackets can be recognized in the
complexity class
In computational complexity theory, a complexity class is a set (mathematics), set of computational problems "of related resource-based computational complexity, complexity". The two most commonly analyzed resources are time complexity, time and s ...
TC^.
[Barrington and Corbett, Information Processing Letters 32 (1989) 251-256]
* The number of distinct Dyck words with exactly pairs of parentheses and innermost pairs (viz. the substring
/math>) is the Narayana number
In combinatorics, the Narayana numbers \operatorname(n, k), n \in \mathbb^+, 1 \le k \le n form a triangular array of natural numbers, called the Narayana triangle, that occur in various Combinatorial enumeration, counting problems. They are named ...
\operatorname(n, k).
* The number of distinct Dyck words with exactly pairs of parentheses is the -th Catalan number C_n. Notice that the Dyck language of words with parentheses pairs is equal to the union, over all possible , of the Dyck languages of words of parentheses pairs ''with innermost pairs'', as defined in the previous point. Since can range from 0 to , we obtain the following equality, which indeed holds:
::C_n = \sum_^n \operatorname(n, k)
Examples
We can define an equivalence relation L on the Dyck language \mathcal. For u,v\in\mathcal we have (u,v)\in L if and only if , u, = , v, , i.e. u and v have the same length. This relation partitions the Dyck language: \mathcal / L = \. We have \mathcal = \mathcal_ \cup \mathcal_ \cup \mathcal_ \cup \ldots = \bigcup_^ \mathcal_ where \mathcal_ = \. Note that \mathcal_ is empty for odd n.
Having introduced the Dyck words of length n, we can introduce a relationship on them. For every n \in \mathbb we define a relation S_ on \mathcal_; for u,v\in\mathcal_ we have (u,v)\in S_ if and only if v can be reached from u by a series of proper swaps. A proper swap in a word u\in\mathcal_ swaps an occurrence of '] with '[.
For each n\in\mathbb the relation S_ makes \mathcal_ into a partially ordered set">html" ;"title=" with '["> with '[.
For each n\in\mathbb the relation S_ makes \mathcal_ into a . The relation S_ is reflexive relation">reflexive because an empty sequence of proper swaps takes u to u. Transitivity follows because we can extend a sequence of proper swaps that takes u to v by concatenating it with a sequence of proper swaps that takes v to w forming a sequence that takes u into w. To see that S_ is also antisymmetric we introduce an auxiliary function \sigma_:\mathcal_\rightarrow\mathbb defined as a sum over all prefixes v of u:
: \sigma_n(u) = \sum_ \Big( (\text v) - (\text v) \Big)
The following table illustrates that \sigma_ is Monotonic function">strictly monotonic with respect to proper swaps.
Hence \sigma_(u') - \sigma_(u) = 2 > 0 so \sigma_(u) < \sigma_(u') when there is a proper swap that takes u into u'. Now if we assume that both (u,v), (v,u)\in S_ and u\ne v, then there are non-empty sequences of proper swaps such u is taken into v and vice versa. But then \sigma_(u) < \sigma_(v) < \sigma_(u) which is nonsensical. Therefore, whenever both (u,v) and (v,u) are in S_, we have u = v, hence S_ is antisymmetric.
The partial ordered set D_ is shown in the illustration accompanying the introduction if we interpret a [ as going up and ] as going down.
See also
* Dyck congruence
* Lattice word
Notes
References
* {{planetmath, dycklanguage
A proof of the Chomsky Schützenberger theorem
An AMS blog entry on Dyck words
Formal languages