History
The history of DCGs is closely tied to the history of Prolog, and the history of Prolog revolves around several researchers in both Marseille, France, and Edinburgh, Scotland. According toExample
A basic example of DCGs helps to illustrate what they are and what they look like.sentence(X,[])
. Similarly, one can test whether a sentence is valid in the language by typing something like sentence([the,bat,eats,the,bat],[])
.
Translation into definite clauses
DCG notation is just syntactic sugar for normal definite clauses in Prolog. For example, the previous example could be translated into the following:Difference lists
The arguments to each functor, such as(A,B)
and (B,Z)
are difference lists; difference lists are a way of representing a prefix of a list as the difference between its two suffixes (the bigger including the smaller one). Using Prolog's notation for lists, a singleton list prefix P = /code> can be seen as the difference between X/code> and X
, and thus represented with the pair ( XX)
, for instance.
Saying that P
is the difference between A
and B
is the same as saying that append(P,B,A)
holds. Or in the case of the previous example, append( X, X
.
Difference lists are used to represent lists with DCGs for reasons of efficiency. It is much more efficient to concatenate list differences (prefixes), in the circumstances that they can be used, because the concatenation of (A,B)
and (B,Z)
is just (A,Z)
.
Indeed, append(P,B,A), append(Q,Z,B)
entails append(P,Q,S), append(S,Z,A)
. This is the same as saying that list concatenation is ''associative'':
A = P + B = P + (Q + Z) = (P + Q) + Z = S + Z = A
Non-context-free grammars
In pure Prolog, normal DCG rules with no extra arguments on the functors, such as the previous example, can only express context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be ...
s; there is only one argument on the left side of the production. However, context-sensitive grammar
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than ...
s can also be expressed with DCGs, by providing extra arguments, such as in the following example:
s --> a(N), b(N), c(N).
a(0) --> [].
a(M) --> [a], a(N), .
b(0) --> [].
b(M) --> [b], b(N), .
c(0) --> [].
c(M) --> [c], c(N), .
This set of DCG rules describes the grammar which generates the language that consists of strings of the form .
s --> symbols(Sem,a), symbols(Sem,b), symbols(Sem,c).
symbols(end,_) --> [].
symbols(s(Sem),S) --> [S], symbols(Sem,S).
This set of DCG rules describes the grammar which generates the language that consists of strings of the form , by structurally representing
Representing features
Various linguistic features can also be represented fairly concisely with DCGs by providing extra arguments to the functors. For example, consider the following set of DCG rules:
sentence --> pronoun(subject), verb_phrase.
verb_phrase --> verb, pronoun(object).
pronoun(subject) --> e
pronoun(subject) --> he
pronoun(object) --> im
pronoun(object) --> er
verb --> ikes
This grammar allows sentences like "he likes her" and "he likes him", but ''not'' "her likes he" and "him likes him".
Parsing with DCGs
The main practical use of a DCG is to parse sentences of the given grammar, i.e. to construct a parse tree. This can be done by providing "extra arguments" to the functors in the DCG, like in the following rules:
sentence(s(NP,VP)) --> noun_phrase(NP), verb_phrase(VP).
noun_phrase(np(D,N)) --> det(D), noun(N).
verb_phrase(vp(V,NP)) --> verb(V), noun_phrase(NP).
det(d(the)) --> he
det(d(a)) -->
noun(n(bat)) --> at
noun(n(cat)) --> at
verb(v(eats)) --> ats
One can now query the interpreter to yield a parse tree of any given sentence:
, ?- sentence(Parse_tree, he,bat,eats,a,cat []).
Parse_tree = s(np(d(the),n(bat)),vp(v(eats),np(d(a),n(cat)))) ? ;
Other uses
DCGs can serve as a convenient syntactic sugar to hide certain parameters in code in other places besides parsing applications.
In the declaratively pure programming language Mercury I/O must be represented by a pair of io.state
arguments.
DCG notation can be used to make using I/O more convenient, although state variable notation is usually preferred.
DCG notation is also used for parsing and similar things in Mercury as it is in Prolog.
Extensions
Since DCGs were introduced by Pereira and Warren, several extensions have been proposed. Pereira himself proposed an extension called extraposition grammars (XGs). This formalism was intended in part to make it easier to express certain grammatical phenomena, such as left-extraposition. Pereira states, "The difference between XG rules and DCG rules is then that the left-hand side of an XG rule may contain several symbols." This makes it easier to express rules for context-sensitive grammars.
Peter Van Roy extended DCGs to allow multiple accumulators.
Another, more recent, extension was made by researchers at NEC Corporation called Multi-Modal Definite Clause Grammars (MM-DCGs) in 1995. Their extensions were intended to allow the recognizing and parsing expressions that include non-textual parts such as pictures.
Another extension, called definite clause translation grammars (DCTGs) was described by Harvey Abramson in 1984. DCTG notation looks very similar to DCG notation; the major difference is that one uses ::=
instead of -->
in the rules. It was devised to handle grammatical attributes conveniently. The translation of DCTGs into normal Prolog clauses is like that of DCGs, but 3 arguments are added instead of 2.
See also
* Chomsky hierarchy
In formal language theory, computer science and linguistics, the Chomsky hierarchy (also referred to as the Chomsky–Schützenberger hierarchy) is a containment hierarchy of classes of formal grammars.
This hierarchy of grammars was described ...
* Context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be ...
* Natural language processing
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
* Phrase structure grammar
The term phrase structure grammar was originally introduced by Noam Chomsky as the term for grammar studied previously by Emil Post and Axel Thue ( Post canonical systems). Some authors, however, reserve the term for more restricted grammars in ...
Notes
External links
NLP with Prolog
Context-free grammars and DCGs
Definite Clause Grammars for Language Analysis
{{Parsing algorithms
Formal languages
Logic programming
Parsing