HOME

TheInfoList



OR:

The dangling else is a problem in programming of parser generators in which an optional else clause in an if–then(–else) statement can make nested conditional statements ambiguous. Formally, the reference
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the fo ...
of the language is ambiguous, meaning there is more than one correct
parse tree A parse tree or parsing tree (also known as a derivation tree or concrete syntax tree) is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is use ...
.


Description

In many
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
s, one may write conditionally executed code in two forms: the if-then form, or the if-then-else form. (The else clause is optional.): if a then s if b then s1 else s2 Ambiguous interpretation becomes possible when there are nested statements; specifically when an if-then-else form replaces the statement s inside the above if-then construct: if a then if b then s1 else s2 In this example, s1 gets executed if and only if a is true ''and'' b is true. But what about s2? One person might be sure that s2 gets executed whenever a is false (by attaching the ''else'' to the first ''if''), while another person might be sure that s2 gets executed only when a is true ''and'' b is false (by attaching the ''else'' to the second ''if''). In other words, someone could interpret the previous statement as being equivalent to either of the following unambiguous statements: if a then else s2 if a then The dangling-else problem dates back to
ALGOL 60 ALGOL 60 (short for ''Algorithmic Language 1960'') is a member of the ALGOL family of computer programming languages. It followed on from ALGOL 58 which had introduced code blocks and the begin and end pairs for delimiting them, representing a ...
, and subsequent languages have resolved it in various ways. In LR parsers, the dangling else is the archetypal example of a shift-reduce conflict.


Examples

Concrete examples follow.


C

In C, the grammar reads, in part:
 statement = ...
    ,  selection-statement

 selection-statement = ...
    ,  IF ( expression ) statement
    ,  IF ( expression ) statement ELSE statement
Thus, without further rules, the statement if (a) if (b) s; else s2; could ambiguously be parsed as if it were either: if (a) or: if (a) else s2; The C standard clarifies that an else block is associated with the nearest if. Therefore, the first tree is chosen.


Approaches to avoid the issue


Avoiding ambiguity while keeping the syntax

This problem often comes up in
compiler construction In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
, especially scannerless parsing. The convention when dealing with the dangling else is to attach the else to the nearby if statement, allowing for unambiguous context-free grammars, in particular. Programming languages like Pascal, C,ISO 9899
1999 (C): 6.8.4.1(3): "An else is associated with the lexically nearest preceding if that is allowed by the syntax.", available a
WG14 N1256
p. 134
and Java follow this convention, so there is no ambiguity in the semantics of the ''language'', though the use of a parser generator may lead to ambiguous ''grammars''. In these cases alternative grouping is accomplished by explicit blocks, such as begin...end in Pascal and in C. Depending on the compiler construction approach, one may take different corrective actions to avoid ambiguity: *If the parser is produced by an SLR, LR(1), or LALR LR parser generator, the programmer will often rely on the generated parser feature of preferring shift over reduce whenever there is a conflict. Alternatively, the grammar can be rewritten to remove the conflict, at the expense of an increase in grammar size (see below). *If the parser is hand-written, the programmer may use a ''non-ambiguous'' context-free grammar. Alternatively, one may rely on a non-context-free grammar or a parsing expression grammar.


Avoiding ambiguity by changing the syntax

The problem can also be solved by making explicit the link between an else and its if, within the syntax. This usually helps avoid human errors. Possible solutions are: * Having an "end if" symbol delimiting the end of the if construct. Examples of such languages are
ALGOL 68 ALGOL 68 (short for ''Algorithmic Language 1968'') is an imperative programming language member of the ALGOL family that was conceived as a successor to the ALGOL 60 language, designed with the goal of a much wider scope of application and ...
, Ada, Eiffel,
PL/SQL PL/SQL (Procedural Language for SQL) is Oracle Corporation's procedural extension for SQL and the Oracle relational database. PL/SQL is available in Oracle Database (since version 6 - stored PL/SQL procedures/functions/packages/triggers sinc ...
,
Visual Basic Visual Basic is a name for a family of programming languages from Microsoft. It may refer to: * Visual Basic (.NET), the current version of Visual Basic launched in 2002 which runs on .NET * Visual Basic (classic), the original Visual Basic suppo ...
,
Modula-2 Modula-2 is a structured, procedural programming language developed between 1977 and 1985/8 by Niklaus Wirth at ETH Zurich. It was created as the language for the operating system and application software of the Lilith personal workstation. It w ...
, and
AppleScript AppleScript is a scripting language created by Apple Inc. that facilitates automated control of Mac applications. First introduced in System 7, it is currently included in macOS in a package of automation tools. The term ''AppleScript'' may ...
. * Disallowing the statement following a "then" to be an "if" itself (it may however be a pair of statement brackets containing only an if-then-clause). Some implementations of
ALGOL 60 ALGOL 60 (short for ''Algorithmic Language 1960'') is a member of the ALGOL family of computer programming languages. It followed on from ALGOL 58 which had introduced code blocks and the begin and end pairs for delimiting them, representing a ...
follow this approach. * Requiring braces (parentheses) when an "else" follows an "if". * Requiring every "if" to be paired with an "else". To avoid a similar problem concerning semantics rather than syntax, Racket deviates from Scheme by considering an if without a fallback clause to be an error, effectively distinguishing conditional ''expressions'' (i.e if) from conditional ''statements'' (i.e when and unless, which do not have fallback clauses). * Using different keywords for the one-alternative and two-alternative "if" statements. S-algol uses if e do s for the one-alternative case and if e1 then e2 else e3 for the general case. * Requiring braces unconditionally, like
Swift Swift or SWIFT most commonly refers to: * SWIFT, an international organization facilitating transactions between banks ** SWIFT code * Swift (programming language) * Swift (bird), a family of birds It may also refer to: Organizations * SWIF ...
and
Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH) ...
. This is effectively true in Python as its indentation rules delimit every block, not just those in "if" statements.


Avoiding the conflict in LR parsers

The above example could be rewritten in the following way to remove the ambiguity :
statement: open_statement
         ,  closed_statement
         ;

open_statement: IF '(' expression ')' statement
              ,  IF '(' expression ')' closed_statement ELSE open_statement
              ;

closed_statement: non_if_statement
                ,  IF '(' expression ')' closed_statement ELSE closed_statement
                ;

non_if_statement: ...
                ;
Any other statement-related grammar rules may also have to be duplicated in this way if they may directly or indirectly end with a statement or selection-statement non-terminal. However, we give grammar that includes both of if and while statements.
statement: open_statement
         ,  closed_statement
         ;

open_statement: IF '(' expression ')' statement
              ,  IF '(' expression ')' closed_statement ELSE open_statement
              ,  WHILE '(' expression ')' open_statement
              ;

closed_statement: simple_statement
                ,  IF '(' expression ')' closed_statement ELSE closed_statement
                ,  WHILE '(' expression ')' closed_statement
                ;

simple_statement: ...
                ;
Finally, we give the grammar that forbids ambiguous IF statements.
statement: open_statement
         ,  closed_statement
         ;

open_statement: IF '(' expression ')' statement
              ,  IF '(' expression ')' closed_statement ELSE open_statement
              ,  WHILE '(' expression ')' open_statement
              ;

closed_statement: simple_statement
                ,  IF '(' expression ')' closed_statement ELSE closed_statement
                ,  WHILE '(' expression ')' closed_statement
                ;

simple_statement: ...
                ;
With this grammar the statement if (a) if (b) c else d can only be parsed one way, because the other interpretation (if (a) else d) is produced as
statement
open_statement
IF '(' expression ')' closed_statement ELSE open_statement
'if' '(' 'a' ')' closed_statement 'else' 'd'
and then the parsing fails trying to match closed_statement to "if (b) c". An attempt with closed_statement fails in the same way. The other parse, if (a) ) succeeds:
statement
open_statement
IF '(' expression ')' statement
IF '(' expression ')' closed_statement
IF '(' a ')' (IF '(' expression ')' closed_statement ELSE closed_statement)
IF '(' a ')' (IF '(' b ')' c ELSE 'd')


See also

* Lexer hack *
Most vexing parse The most vexing parse is a counterintuitive form of syntactic ambiguous grammar, ambiguity resolution in the C++ programming language. In certain situations, the C++ grammar cannot distinguish between the Initialization (programming), creation of ...


References

{{Reflist Ambiguity Computer programming Conditional constructs Parsing