HOME

TheInfoList



OR:

Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi. Tree-adjoining grammars are somewhat similar to
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the fo ...
s, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees (see
tree (graph theory) In graph theory, a tree is an undirected graph in which any two vertices are connected by path, or equivalently a connected acyclic undirected graph. A forest is an undirected graph in which any two vertices are connected by path, or equi ...
and
tree (data structure) In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be co ...
).


History

TAG originated in investigations by Joshi and his students into the family of adjunction grammars (AG), the "string grammar" of Zellig Harris. AGs handle exocentric properties of language in a natural and effective way, but do not have a good characterization of endocentric constructions; the converse is true of rewrite grammars, or
phrase-structure grammar The term phrase structure grammar was originally introduced by Noam Chomsky as the term for grammar studied previously by Emil Post and Axel Thue ( Post canonical systems). Some authors, however, reserve the term for more restricted grammars in th ...
(PSG). In 1969, Joshi introduced a family of grammars that exploits this complementarity by mixing the two types of rules. A few very simple rewrite rules suffice to generate the vocabulary of strings for adjunction rules. This family is distinct from the Chomsky-Schützenberger hierarchy but intersects it in interesting and linguistically relevant ways. The center strings and adjunct strings can also be generated by a
dependency grammar Dependency grammar (DG) is a class of modern Grammar, grammatical theories that are all based on the dependency relation (as opposed to the ''constituency relation'' of Phrase structure grammar, phrase structure) and that can be traced back prima ...
, avoiding the limitations of rewrite systems entirely.


Description

The rules in a TAG are trees with a special leaf node known as the ''foot node'', which is anchored to a word. There are two types of basic trees in TAG: ''initial'' trees (often represented as '\alpha') and ''auxiliary'' trees ('\beta'). Initial trees represent basic valency relations, while auxiliary trees allow for recursion. Auxiliary trees have the root (top) node and foot node labeled with the same symbol. A derivation starts with an initial tree, combining via either ''substitution'' or ''adjunction''. Substitution replaces a frontier node with another tree whose top node has the same label. The root/foot label of the auxiliary tree must match the label of the node at which it adjoins. Adjunction can thus have the effect of inserting an auxiliary tree into the center of another tree. Other variants of TAG allow multi-component trees, trees with multiple foot nodes, and other extensions.


Complexity and application

Tree-adjoining grammars are more powerful (in terms of weak generative capacity) than
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the fo ...
s, but less powerful than linear context-free rewriting systems,Kallmeyer, Laura (2010). Parsing Beyond Context-Free Grammars. Springer. Here: p.215-216 indexedsince for each tree-adjoining grammar, a linear indexed grammar can be found producing the same language, see below, and for the latter, a weakly equivalent (proper) indexed grammar can be found, in turn, see Indexed grammar#Computational Power or context-sensitive grammars. A TAG can describe the language of squares (in which some arbitrary string is repeated), and the language \. This type of processing can be represented by an embedded pushdown automaton. Languages with cubes (i.e. triplicated strings) or with more than four distinct character strings of equal length cannot be generated by tree-adjoining grammars. For these reasons, tree-adjoining grammars are often described as mildly context-sensitive. These grammar classes are conjectured to be powerful enough to model
natural language A natural language or ordinary language is a language that occurs naturally in a human community by a process of use, repetition, and change. It can take different forms, typically either a spoken language or a sign language. Natural languages ...
s while remaining efficiently parsable in the general case.


Equivalences

Vijay-Shanker and Weir (1994)Vijay-Shanker, K. and Weir, David J. 1994. ''The Equivalence of Four Extensions of Context-Free Grammars''. Mathematical Systems Theory 27(6): 511–546. demonstrate that linear indexed grammars, combinatory categorial grammar, tree-adjoining grammars, and head grammars are weakly equivalent formalisms, in that they all define the same string languages.


Lexicalized

Lexicalized tree-adjoining grammars (LTAG) are a variant of TAG in which each elementary tree (initial or auxiliary) is associated with a lexical item. A lexicalized grammar for English has been developed by the XTAG Research Group of the Institute for Research in Cognitive Science at the University of Pennsylvania.


See also

* String grammar


Notes


References


External links


The XTAG project
which uses a TAG for natural language processing.
SemConst Documentation
A quick survey on Syntax and Semantic Interface problematic within the TAG framework.
The TuLiPa project
The Tübingen Linguistic Parsing Architecture (TuLiPA) is a multi-formalism syntactic (and semantic) parsing environment, designed mainly for multi-component tree adjoining grammars with tree tuples
The Metagrammar Toolkit
which provides several tools to edit and compile MetaGrammars into TAGs. It also include a wide coverage French Metagrammars.
LLP2
A lexicalized tree adjoining grammar parser which provides an easy to use graphical environment (page in French) {{DEFAULTSORT:Tree-Adjoining Grammar Generative linguistics Grammar frameworks