Hygienic macros are
macros whose expansion is guaranteed not to cause the accidental
capture
Capture may refer to:
*Asteroid capture, a phenomenon in which an asteroid enters a stable orbit around another body
*Capture, a software for lighting design, documentation and visualisation
*"Capture" a song by Simon Townshend
*Capture (band), an ...
of
identifiers
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, physical countable object (or class thereof), or physical noncountable ...
. They are a feature of
programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming l ...
s such as
Scheme A scheme is a systematic plan for the implementation of a certain idea.
Scheme or schemer may refer to:
Arts and entertainment
* ''The Scheme'' (TV series), a BBC Scotland documentary series
* The Scheme (band), an English pop band
* ''The Schem ...
,
Dylan,
Rust
Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH), ...
,
Nim, and
Julia
Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e ...
. The general problem of accidental capture was well known within the
Lisp
A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech.
Types
* A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lispi ...
community prior to the introduction of hygienic macros. Macro writers would use language features that would generate unique identifiers (e.g., gensym) or use obfuscated identifiers in order to avoid the problem. Hygienic macros are a programmatic solution to the capture problem that is integrated into the macro expander itself. The term "hygiene" was coined in Kohlbecker et al.'s 1986 paper that introduced hygienic macro expansion, inspired by the terminology used in mathematics.
The hygiene problem
Variable shadowing
In programming languages that have non-hygienic macro systems, it is possible for existing variable bindings to be hidden from a macro by variable bindings that are created during its expansion. In
C, this problem can be illustrated by the following fragment:
#define INCI(i) do while (0)
int main(void)
Running the above through the
C preprocessor
The C preprocessor is the macro preprocessor for the C, Objective-C and C++ computer programming languages. The preprocessor provides the ability for the inclusion of header files, macro expansions, conditional compilation, and line contro ...
produces:
int main(void)
The variable
a
declared in the top scope is
shadowed
''Shadowed'', also known as ''The Gloved Hand'', is a 1946 American film noir crime film directed by John Sturges and starring Anita Louise, Lloyd Corrigan, and Robert Scott.
Plot
Salesman Fred J. Johnson manages to hit a hole-in-one as he p ...
by the
a
variable in the macro, which introduces a new
scope
Scope or scopes may refer to:
People with the surname
* Jamie Scope (born 1986), English footballer
* John T. Scopes (1900–1970), central figure in the Scopes Trial regarding the teaching of evolution
Arts, media, and entertainment
* CinemaS ...
. As a result,
a
is never altered by the execution of the program, as the output of the compiled program shows:
a is now 4, b is now 9
Standard library function redefinition
The hygiene problem can extend beyond variable bindings. Consider this
Common Lisp
Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ''ANSI INCITS 226-1994 (S20018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived fr ...
macro:
(defmacro my-unless (condition &body body)
`(if (not ,condition)
(progn
,@body)))
While there are no references to variables in this macro, it assumes the symbols "if", "not", and "progn" are all bound to their usual definitions in the standard library. If, however the above macro is used in the following code:
(flet ((not (x) x))
(my-unless t
(format t "This should not be printed!")))
The definition of "not" has been locally altered and so the expansion of
my-unless
changes.
Program-defined function redefinition
Of course, the problem can occur for program-defined functions in a similar way:
(defun user-defined-operator (cond)
(not cond))
(defmacro my-unless (condition &body body)
`(if (user-defined-operator ,condition)
(progn
,@body)))
; ... later ...
(flet ((user-defined-operator (x) x))
(my-unless t
(format t "This should not be printed!")))
The use site redefines
user-defined-operator
and hence changes the behavior of the macro.
Strategies used in languages that lack hygienic macros
The hygiene problem can be resolved with conventional macros using several alternative solutions.
Obfuscation
The simplest solution, if temporary storage is needed during the expansion of a macro, is to use unusual variables names in the macro in the hope that the same names will never be used by the rest of the program.
#define INCI(i) do while (0)
int main(void)
Until a variable named
INCIa
is created, this solution produces the correct output:
a is now 5, b is now 9
The problem is solved for the current program, but this solution is not robust. The variables used inside the macro and those in the rest of the program have to be kept in sync by the programmer. Specifically, using the macro
INCI
on a variable
INCIa
is going to fail in the same way that the original macro failed on a variable
a
.
Temporary symbol creation
In some programming languages, it is possible for a new variable name, or symbol, to be generated and bound to a temporary location. The language processing system ensures that this never clashes with another name or location in the execution environment. The responsibility for choosing to use this feature within the body of a macro definition is left to the programmer. This method was used in
MacLisp, where a function named
gensym
could be used to generate a new symbol name. Similar functions (usually named
gensym
as well) exist in many Lisp-like languages, including the widely implemented
Common Lisp
Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ''ANSI INCITS 226-1994 (S20018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived fr ...
standard and
Elisp.
Although symbol creation solves the variable shadowing issue, it does not directly solve the issue of function redefinition. However,
gensym
, macro facilities, and standard library functions are sufficient to embed hygienic macros in an unhygienic language.
Read-time Uninterned Symbol
This is similar to obfuscation in that a single name is shared by multiple expansions of the same macro. Unlike an unusual name, however, a read time uninterned symbol is used (denoted by the
#:
notation), for which it is impossible to occur outside of the macro, similar to
gensym
.
Packages
Using packages such as in Common Lisp, the macro simply uses a private symbol from the package in which the macro is defined. The symbol will not accidentally occur in user code. User code would have to reach inside the package using the double colon (
::
) notation to give itself permission to use the private symbol, for instance
cool-macros::secret-sym
. At that point, the issue of accidental lack of hygiene is moot. Furthermore the ANSI Common Lisp standard categorizes redefining standard functions and operators, globally or locally, as invoking
undefined behavior
In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification to which the computer code adheres. This is different from unspecified behavio ...
. Such usage can be thus diagnosed by the implementation as erroneous. Thus the Lisp package system provide a viable, complete solution to the macro hygiene problem, which can be regarded as an instance of name clashing.
For example, in the
program-defined function redefinition example, the
my-unless
macro can reside in its own package, where
user-defined-operator
is a private symbol in that package. The symbol
user-defined-operator
occurring in the user code will then be a different symbol, unrelated to the one used in the definition of the
my-unless
macro.
Literal objects
In some languages the expansion of a macro does not need to correspond to textual code; rather than expanding to an expression containing the symbol
f
, a macro may produce an expansion containing the actual object referred to by
f
. Similarly if the macro needs to use local variables or objects defined in the macro's package, it can expand to an invocation of a closure object whose enclosing lexical environment is that of the macro definition.
Hygienic transformation
Hygienic macro systems in languages such as
Scheme A scheme is a systematic plan for the implementation of a certain idea.
Scheme or schemer may refer to:
Arts and entertainment
* ''The Scheme'' (TV series), a BBC Scotland documentary series
* The Scheme (band), an English pop band
* ''The Schem ...
use a macro expansion process that preserves the lexical scoping of all identifiers and prevents accidental capture. This property is called
referential transparency
In computer science, referential transparency and referential opacity are properties of parts of computer programs. An expression is called ''referentially transparent'' if it can be replaced with its corresponding value (and vice-versa) witho ...
. In cases where capture is desired, some systems allow the programmer to explicitly violate the hygiene mechanisms of the macro system.
For example, Scheme's
let-syntax
and
define-syntax
macro creation systems are hygienic, so the following Scheme implementation of
my-unless
will have the desired behavior:
(define-syntax my-unless
(syntax-rules ()
((_ condition body ...)
(if (not condition)
(begin body ...)))))
(let ((not (lambda (x) x)))
(my-unless #t
(display "This should not be printed!")
(newline)))
The hygienic macro processor responsible for transforming the patterns of the input form into an output form detects symbol clashes and resolves them by temporarily changing the names of symbols. The basic strategy is to identify ''bindings'' in the macro definition and replace those names with gensyms, and to identify ''free variables'' in the macro definition and make sure those names are looked up in the scope of the macro definition instead of the scope where the macro was used.
Implementations
Macro systems that automatically enforce hygiene originated with Scheme. The original KFFD algorithm for a hygienic macro system was presented by Kohlbecker in '86.
At the time, no standard macro system was adopted by Scheme implementations. Shortly thereafter in '87, Kohlbecker and
Wand
A wand is a thin, light-weight rod that is held with one hand, and is traditionally made of wood, but may also be made of other materials, such as metal or plastic.
Long versions of wands are often styled in forms of staves or sceptres, which ...
proposed a declarative pattern-based language for writing macros, which was the predecessor to the
syntax-rules
macro facility adopted by the R5RS standard.
Syntactic closures, an alternative hygiene mechanism, was proposed as an alternative to Kohlbecker et al.'s system by Bawden and Rees in '88.
Unlike the KFFD algorithm, syntactic closures require the programmer to explicitly specify the resolution of the scope of an identifier. In 1993, Dybvig et al. introduced the
syntax-case
macro system, which uses an alternative representation of syntax and maintains hygiene automatically.
The
syntax-case
system can express the
syntax-rules
pattern language as a derived macro. The term ''macro system'' can be ambiguous because, in the context of Scheme, it can refer to both a pattern-matching construct (e.g., syntax-rules) and a framework for representing and manipulating syntax (e.g., syntax-case, syntactic closures).
Syntax-rules
Syntax-rules is a high-level
pattern matching
In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be ...
facility that attempts to make macros easier to write. However,
syntax-rules
is not able to succinctly describe certain classes of macros and is insufficient to express other macro systems. Syntax-rules was described in the R4RS document in an appendix but not mandated. Later, R5RS adopted it as a standard macro facility. Here is an example
syntax-rules
macro that swaps the value of two variables:
(define-syntax swap!
(syntax-rules ()
((_ a b)
(let ((temp a))
(set! a b)
(set! b temp)))))
Syntax-case
Due to the deficiencies of a purely
syntax-rules
based macro system, the
R6RS
Scheme is a dialect of the Lisp family of programming languages. Scheme was created during the 1970s at the MIT AI Lab and released by its developers, Guy L. Steele and Gerald Jay Sussman, via a series of memos now known as the Lambda Papers. ...
Scheme standard adopted the syntax-case macro system.
Unlike
syntax-rules
,
syntax-case
contains both a pattern matching language and a low-level facility for writing macros. The former allows macros to be written declaratively, while the latter allows the implementation of alternative frontends for writing macros. The swap example from before is nearly identical in
syntax-case
because the pattern matching language is similar:
(define-syntax swap!
(lambda (stx)
(syntax-case stx ()
((_ a b)
(syntax
(let ((temp a))
(set! a b)
(set! b temp)))))))
However,
syntax-case
is more powerful than syntax-rules. For example,
syntax-case
macros can specify side-conditions on its pattern matching rules via arbitrary Scheme functions. Alternatively, a macro writer can choose not to use the pattern matching frontend and manipulate the syntax directly. Using the
datum->syntax
function, syntax-case macros can also intentionally capture identifiers, thus breaking hygiene.
Other systems
Other macro systems have also been proposed and implemented for Scheme. Syntactic closures and explicit renaming are two alternative macro systems. Both systems are lower-level than syntax-rules and leave the enforcement of hygiene to the macro writer. This differs from both syntax-rules and syntax-case, which automatically enforce hygiene by default. The swap examples from above are shown here using a syntactic closure and explicit renaming implementation respectively:
;; syntactic closures
(define-syntax swap!
(sc-macro-transformer
(lambda (form environment)
(let ((a (close-syntax (cadr form) environment))
(b (close-syntax (caddr form) environment)))
`(let ((temp ,a))
(set! ,a ,b)
(set! ,b temp))))))
;; explicit renaming
(define-syntax swap!
(er-macro-transformer
(lambda (form rename compare)
(let ((a (cadr form))
(b (caddr form))
(temp (rename 'temp)))
`(,(rename 'let) ((,temp ,a))
(,(rename 'set!) ,a ,b)
(,(rename 'set!) ,b ,temp))))))
Languages with hygienic macro systems
*
Scheme A scheme is a systematic plan for the implementation of a certain idea.
Scheme or schemer may refer to:
Arts and entertainment
* ''The Scheme'' (TV series), a BBC Scotland documentary series
* The Scheme (band), an English pop band
* ''The Schem ...
– syntax-rules, syntax-case, syntactic closures, and others.
*
Racket – an offshoot of Scheme. Its macro system was originally based on syntax-case, but now has more features.
*
Nemerle
Nemerle is a general-purpose, high-level, statically typed programming language designed for platforms using the Common Language Infrastructure ( .NET/ Mono). It offers functional, object-oriented, aspect-oriented, reflective and imperative fe ...
*
Dylan
*
Elixir
ELIXIR (the European life-sciences Infrastructure for biological Information) is an initiative that will allow life science laboratories across Europe to share and store their research data as part of an organised network. Its goal is to bring t ...
*
Nim
*
Rust
Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH), ...
*
Haxe
Haxe is an open source high-level cross-platform programming language and compiler that can produce applications and source code, for many different computing platforms from one code-base. It is free and open-source software, released under the ...
*
Mary2 – scoped macro bodies in an Algol68-derivative language circa 1978
*
Julia
Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e ...
*
Raku – supports both hygienic and unhygienic macros
Criticism
Hygienic macros offer safety and referential transparency at the expense of making intentional variable capture less straight-forward. Doug Hoyte, author of ''Let Over Lambda'', writes:
Many hygienic macro systems do offer escape hatches without compromising on the guarantees that hygiene provides; for instance, Racket allows you to defin
syntax parameters which allow you to selectively introduce bound variables. Gregg Henderschott gives an example at Fear of Macros
Fear of Macros of implementing an anaphoric if operator in this way.
See also
* Anaphoric macro An anaphoric macro is a type of programming macro that deliberately captures some form supplied to the macro which may be referred to by an ''anaphor'' (an expression referring to another). Anaphoric macros first appeared in Paul Graham's '' On Li ...
* Partial evaluation
In computing, partial evaluation is a technique for several different types of program optimization by specialization. The most straightforward application is to produce new programs that run faster than the originals while being guaranteed to ...
* Preprocessor
In computer science, a preprocessor (or precompiler) is a program that processes its input data to produce output that is used as input in another program. The output is said to be a preprocessed form of the input data, which is often used by s ...
* Syntactic closure In computer science, syntactic closures are an implementation strategy for a hygienic macro system. The term pertains to the Scheme programming language.
When a syntactic closure is used the arguments to a macro call are enclosed
Enclosure o ...
Notes
References
*'' On Lisp'', Paul Graham
syntax-rules on schemewiki
syntax-case on schemewiki
examples of syntax-case on schemewiki
syntactic closures on schemewiki
simpler-macros on schemewiki
examples of simpler-macros on schemewiki
Writing Hygienic Macros in Scheme with Syntax-Case
{{DEFAULTSORT:Macros, Hygienic
Transformation languages
Scheme (programming language)
Dylan (programming language)
Metaprogramming