An embedded pushdown automaton or EPDA is a
computational model for parsing languages generated by
tree-adjoining grammar
Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi. Tree-adjoining grammars are somewhat similar to context-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free gr ...
s (TAGs). It is similar to the
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be ...
-parsing
pushdown automaton
In the theory of computation, a branch of theoretical computer science, a pushdown automaton (PDA) is
a type of automaton that employs a stack.
Pushdown automata are used in theories about what can be computed by machines. They are more capab ...
, but instead of using a plain
stack
Stack may refer to:
Places
* Stack Island, an island game reserve in Bass Strait, south-eastern Australia, in Tasmania’s Hunter Island Group
* Blue Stack Mountains, in Co. Donegal, Ireland
People
* Stack (surname) (including a list of people ...
to store symbols, it has a stack of iterated stacks that store symbols, giving TAGs a generative capacity between context-free and
context-sensitive grammar
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than ...
s, or a subset of
mildly context-sensitive grammars.
Embedded pushdown automata should not be confused with
nested stack automata
In automata theory, a nested stack automaton is a finite automaton that can make use of a stack containing data which can be additional stacks.
Like a stack automaton, a nested stack automaton may step up or down in the stack, and read the cur ...
which have more computational power.
History and applications
EPDAs were first described by K. Vijay-Shanker in his 1988 doctoral thesis. They have since been applied to more complete descriptions of classes of mildly context-sensitive grammars and have had important roles in refining the
Chomsky hierarchy
In formal language theory, computer science and linguistics, the Chomsky hierarchy (also referred to as the Chomsky–Schützenberger hierarchy) is a containment hierarchy of classes of formal grammars.
This hierarchy of grammars was described ...
. Various subgrammars, such as the
linear indexed grammar, can thus be defined.
While natural languages have traditionally been analyzed using context-free grammars (see
transformational-generative grammar
In linguistics, transformational grammar (TG) or transformational-generative grammar (TGG) is part of the theory of generative grammar, especially of natural languages. It considers grammar to be a system of rules that generate exactly those combin ...
and
computational linguistics
Computational linguistics is an Interdisciplinarity, interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, comput ...
), this model does not work well for languages with crossed dependencies, such as Dutch, situations for which an EPDA is well suited. A detailed linguistic analysis is available in Joshi, Schabes (1997).
Theory
An EPDA is a finite state machine with a set of stacks that can be themselves accessed through the ''embedded stack''. Each stack contains elements of the ''stack alphabet''
, and so we define an element of a stack by
, where the star is the
Kleene closure of the alphabet.
Each stack can then be defined in terms of its elements, so we denote the
th stack in the automaton using a double-dagger symbol:
, where
would be the next accessible symbol in the stack. The ''embedded stack'' of
stacks can thus be denoted by
.
We define an EPDA by the septuple (7-tuple)
:
where
*
is a finite set of ''states'';
*
is the finite set of the ''input alphabet'';
*
is the finite ''stack alphabet'';
*
is the ''start state'';
*
is the set of ''final states'';
*
is the ''initial stack symbol''
*
is the ''transition function'', where
are finite subsets of
.
Thus the transition function takes a state, the next symbol of the input string, and the top symbol of the current stack and generates the next state, the stacks to be pushed and popped onto the ''embedded stack'', the pushing and popping of the current stack, and the stacks to be considered the current stacks in the next transition. More conceptually, the ''embedded stack'' is pushed and popped, the current stack is optionally pushed back onto the ''embedded stack'', and any other stacks one would like are pushed on top of that, with the last stack being the one read from in the next iteration. Therefore, stacks can be pushed both above and below the current stack.
A given configuration is defined by
:
where
is the current state, the
s are the stacks in the ''embedded stack'', with
the current stack, and for an input string
,
is the portion of the string already processed by the machine and
is the portion to be processed, with its head being the current symbol read. Note that the empty string
is implicitly defined as a terminating symbol, where if the machine is at a final state when the empty string is read, the entire input string is ''accepted'', and if not it is ''rejected''. Such ''accepted'' strings are elements of the language
:
where
and
defines the transition function applied over as many times as necessary to parse the string.
An informal description of EPDA can also be found in Joshi, Schabes (1997),
Sect.7, p. 23-25.
''k''-order EPDA and the Weir hierarchy
A more precisely defined hierarchy of languages that correspond to the mildly context-sensitive class was defined by David J. Weir.
Based on the work of Nabil A. Khabbaz,
Weir's Control Language Hierarchy is a containment where the ''Level-1'' is defined as context-free, and ''Level-2'' is the class of tree-adjoining and the other three grammars.
Following are some of the properties of Level-''k'' languages in the hierarchy:
*Level-''k'' languages are properly contained in the Level-(''k'' + 1) language class
*Level-''k'' languages can be parsed in
time
*Level-''k'' contains the language
, but not
*Level-''k'' contains the language
, but not
Those properties correspond well (at least for small ''k'' > 1) to the conditions of mildly context-sensitive languages imposed by Joshi, and as ''k'' gets bigger, the language class becomes, in a sense, less mildly context-sensitive.
See also
*
combinatory categorial grammar
Combinatory categorial grammar (CCG) is an efficiently parsable, yet linguistically expressive grammar formalism. It has a transparent interface between surface syntax and underlying semantic representation, including predicate–argument structure ...
References
Further reading
*
{{Formal languages and grammars
Models of computation
Automata (computation)