Operator-precedence grammar
   HOME

TheInfoList



OR:

An operator precedence grammar is a kind of
grammar In linguistics, the grammar of a natural language is its set of structural constraints on speakers' or writers' composition of clauses, phrases, and words. The term can also refer to the study of such constraints, a field that includes domain ...
for
formal language In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of sy ...
s. Technically, an operator precedence grammar is a context-free grammar that has the property (among others) that no production has either an empty right-hand side or two adjacent nonterminals in its right-hand side. These properties allow precedence relations to be defined between the terminals of the grammar. A parser that exploits these relations is considerably simpler than more general-purpose parsers such as
LALR parser In computer science, an LALR parser or Look-Ahead LR parser is a simplified version of a canonical LR parser, to parse a text according to a set of production rules specified by a formal grammar for a computer language. ("LR" means left-to-rig ...
s. Operator-precedence parsers can be constructed for a large class of context-free grammars.


Precedence relations

Operator precedence grammars rely on the following three precedence relations between the terminals: These operator precedence relations allow to delimit the
handles A handle is a part of, or attachment to, an object that allows it to be grasped and manipulated by hand. The design of each type of handle involves substantial ergonomic issues, even where these are dealt with intuitively or by following t ...
in the right sentential forms: \lessdot marks the left end, \doteq appears in the interior of the handle, and \gtrdot marks the right end. Contrary to other shift-reduce parsers, all nonterminals are considered equal for the purpose of identifying handles. The relations do not have the same properties as their un-dotted counterparts; e. g. a \doteq b does not generally imply b \doteq a, and b \gtrdot a does not follow from a \lessdot b. Furthermore, a \doteq a does not generally hold, and a \gtrdot a is possible. Let us assume that between the terminals and there is always exactly one precedence relation. Suppose that $ is the end of the string. Then for all terminals we define: \$ \lessdot b and b \gtrdot \$. If we remove all nonterminals and place the correct precedence relation: \lessdot, \doteq, \gtrdot between the remaining terminals, there remain strings that can be analyzed by an easily developed
bottom-up parser Bottom-up may refer to: * Bottom-up analysis, a fundamental analysis technique in accounting and finance * Bottom-up parsing, a computer science strategy * Bottom-up processing, in Pattern recognition (psychology) * Bottom-up theories of galaxy fo ...
.


Example

For example, the following operator precedence relations can be introduced for simple expressions: :\begin & \mathrm & + & * & \$ \\ \hline \mathrm & & \gtrdot & \gtrdot & \gtrdot \\ + & \lessdot & \gtrdot & \lessdot & \gtrdot \\ * & \lessdot & \gtrdot & \gtrdot & \gtrdot \\ \$ & \lessdot & \lessdot & \lessdot & \end They follow from the following facts: * + has lower precedence than * (hence + \lessdot * and * \gtrdot +). * Both + and * are
left-associative In programming language theory, the associativity of an operator is a property that determines how operators of the same precedence are grouped in the absence of parentheses. If an operand is both preceded and followed by operators (for exampl ...
(hence + \gtrdot + and * \gtrdot *). The input string : \mathrm_1 + \mathrm_2 * \mathrm_3 after adding end markers and inserting precedence relations becomes : \$ \lessdot \mathrm_1 \gtrdot + \lessdot \mathrm_2 \gtrdot * \lessdot \mathrm_3 \gtrdot \$


Operator precedence parsing

Having precedence relations allows to identify handles as follows: * scan the string from left until seeing \gtrdot * scan backwards (from right to left) over any \doteq until seeing \lessdot * everything between the two relations \lessdot and \gtrdot, including any intervening or surrounding nonterminals, forms the handle It is generally not necessary to scan the entire
sentential form In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
to find the handle.


Operator precedence parsing algorithm

The algorithm below is from Aho et al.: If $ is on the top of the stack and ip points to $ then return else Let a be the top terminal on the stack, and b the symbol pointed to by ip if ''a'' \lessdot ''b'' or ''a'' \doteq ''b'' then push ''b'' onto the stack advance ip to the next input symbol else if ''a'' \gtrdot ''b'' then repeat pop the stack until the top stack terminal is related by \lessdot to the terminal most recently popped else error() end


Precedence functions

An operator precedence parser usually does not store the precedence table with the relations, which can get rather large. Instead, precedence functions ''f'' and ''g'' are defined. They map terminal symbols to integers, and so the precedence relations between the symbols are implemented by numerical comparison: must hold if a \lessdot b holds, etc. Not every table of precedence relations has precedence functions, but in practice for most grammars such functions can be designed.


Algorithm for constructing precedence functions

The below algorithm is from Aho et al.: # Create symbols and for each grammar terminal and for the end of string symbol; # Partition the created symbols in groups so that and are in the same group if a \doteq b (there can be symbols in the same group even if their terminals are not connected by this relation); # Create a
directed graph In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs. Definition In formal terms, a directed graph is an ordered pa ...
whose nodes are the groups. For each pair of terminals do: place an edge from the group of to the group of if otherwise if a \gtrdot b place an edge from the group of to that of ; # If the constructed graph has a cycle then no precedence functions exist. When there are no cycles, let be the length of the
longest path In graph theory and theoretical computer science, the longest path problem is the problem of finding a simple path of maximum length in a given graph. A path is called ''simple'' if it does not have any repeated vertices; the length of a path ma ...
from the group of and let be the length of the longest path from the group of .


Example

Consider the following table (repeated from above): :\begin & \mathrm & + & * & \$ \\ \hline \mathrm & & \gtrdot & \gtrdot & \gtrdot \\ + & \lessdot & \gtrdot & \lessdot & \gtrdot \\ * & \lessdot & \gtrdot & \gtrdot & \gtrdot \\ \$ & \lessdot & \lessdot & \lessdot & \end Using the algorithm leads to the following graph: gid \ fid f* \ / g* / f+ , \ , g+ , , g$ f$ from which we extract the following precedence functions from the maximum heights in the
directed acyclic graph In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one v ...
: :


Operator-precedence languages

The class of languages described by operator-precedence grammars, i.e., operator-precedence languages, is strictly contained in the class of
deterministic context-free language In formal language theory, deterministic context-free languages (DCFL) are a proper subset of context-free languages. They are the context-free languages that can be accepted by a deterministic pushdown automaton. DCFLs are always unambiguous, mea ...
s, and strictly contains visibly pushdown languages. Operator-precedence languages enjoy many closure properties: union, intersection, complementation, concatenation, and they are the largest known class closed under all these operations and for which the emptiness problem is decidable. Another peculiar feature of operator-precedence languages is their local parsability, that enables efficient parallel parsing. There are also characterizations based on an equivalent form of automata and monadic second-order logic.


Notes


References

* * * * *


Further reading

*


External links

* Nikolay Nikolaev
IS53011A Language Design and Implementation
Course notes for CIS 324, 2010. {{DEFAULTSORT:Operator-Precedence Grammar Formal languages