In
compiler design, static single assignment form (often abbreviated as SSA form or simply SSA) is a property of an
intermediate representation (IR) that requires each variable to be
assigned exactly once and defined before it is used. Existing variables in the original IR are split into ''versions'', new variables typically indicated by the original name with a subscript in textbooks, so that every definition gets its own version. In SSA form,
use-def chains
Within computer science, a Use-Definition Chain (''UD Chain'') is a data structure that consists of a use, U, of a variable, and all the definitions, D, of that variable that can reach that use without any other intervening definitions. A UD Chain ...
are explicit and each contains a single element.
SSA was proposed by
Barry K. Rosen Barry may refer to:
People and fictional characters
* Barry (name), including lists of people with the given name, nickname or surname, as well as fictional characters with the given name
* Dancing Barry, stage name of Barry Richards (born c. 1950 ...
,
Mark N. Wegman
Mark N. Wegman is an American computer scientist known for his contributions to algorithms and compiler optimization. Wegman received his B.A. from New York University and his Ph.D. from the University of California, Berkeley. He joined IBM Resea ...
, and
F. Kenneth Zadeck
F is the sixth letter of the Latin alphabet.
F may also refer to:
Science and technology Mathematics
* F or f, the number 15 in hexadecimal and higher positional systems
* ''p'F'q'', the hypergeometric function
* F-distribution, a con ...
in 1988.
Ron Cytron Ron is a shortening of the name Ronald.
Ron or RON may also refer to:
Arts and media
* Big Ron (''EastEnders''), a TV character
* Ron (''King of Fighters''), a video game character
*Ron Douglas, the protagonist in ''Lucky Stiff'' played by Joe A ...
,
Jeanne Ferrante
Jeanne Ferrante is a computer scientist active in the field of compiler technology, where she has made important contributions regarding optimization and parallelization. Jeanne Ferrante is Professor of Computer Science and Engineering at Uni ...
and the previous three researchers at
IBM developed an algorithm that can compute the SSA form efficiently.
One can expect to find SSA in a compiler for
Fortran,
C or
C++, whereas in
functional language compilers, such as those for
Scheme A scheme is a systematic plan for the implementation of a certain idea.
Scheme or schemer may refer to:
Arts and entertainment
* ''The Scheme'' (TV series), a BBC Scotland documentary series
* The Scheme (band), an English pop band
* ''The Schem ...
and
ML,
continuation-passing style (CPS) is generally used. SSA is formally equivalent to a well-behaved subset of CPS excluding non-local control flow, which does not occur when CPS is used as intermediate representation. So optimizations and transformations formulated in terms of one immediately apply to the other.
Benefits
The primary usefulness of SSA comes from how it simultaneously simplifies and improves the results of a variety of
compiler optimization
In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power con ...
s, by simplifying the properties of variables. For example, consider this piece of code:
y := 1
y := 2
x := y
Humans can see that the first assignment is not necessary, and that the value of
y
being used in the third line comes from the second assignment of
y
. A program would have to perform
reaching definition analysis to determine this. But if the program is in SSA form, both of these are immediate:
y
1 := 1
y
2 := 2
x
1 := y
2
Compiler optimization
In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power con ...
algorithms that are either enabled or strongly enhanced by the use of SSA include:
*
Constant propagation – conversion of computations from runtime to compile time, e.g. treat the instruction
a=3*4+5;
as if it were
a=17;
*
Value range propagation – precompute the potential ranges a calculation could be, allowing for the creation of branch predictions in advance
*
Sparse conditional constant propagation – range-check some values, allowing tests to predict the most likely branch
*
Dead-code elimination
In compiler theory, dead-code elimination (also known as DCE, dead-code removal, dead-code stripping, or dead-code strip) is a compiler optimization to remove code which does not affect the program results. Removing such code has several benefits: ...
– remove code that will have no effect on the results
*
Global value numbering – replace duplicate calculations producing the same result
*
Partial-redundancy elimination
In compiler theory, partial redundancy elimination (PRE) is a compiler optimization that eliminates expressions that are redundant on some but not necessarily all paths through a program. PRE is a form of common subexpression elimination.
An exp ...
– removing duplicate calculations previously performed in some branches of the program
*
Strength reduction – replacing expensive operations by less expensive but equivalent ones, e.g. replace integer multiply or divide by powers of 2 with the potentially less expensive shift left (for multiply) or shift right (for divide).
*
Register allocation – optimize how the limited number of machine registers may be used for calculations
Converting to SSA
Converting ordinary code into SSA form is primarily a matter of replacing the target of each assignment with a new variable, and replacing each use of a variable with the "version" of the variable
reaching that point. For example, consider the following
control-flow graph:
Changing the name on the left hand side of "x
x - 3" and changing the following uses of
x to that new name would leave the program unaltered. This can be exploited in SSA by creating two new variables:
x1 and
x2, each of which is assigned only once. Likewise, giving distinguishing subscripts to all the other variables yields:
It is clear which definition each use is referring to, except for one case: both uses of
y in the bottom block could be referring to either
y1 or
y2, depending on which path the control flow took.
To resolve this, a special statement is inserted in the last block, called a ''Φ (Phi) function''. This statement will generate a new definition of ''y'' called
y3 by "choosing" either
y1 or
y2, depending on the control flow in the past.
Now, the last block can simply use
y3, and the correct value will be obtained either way. A Φ function for ''x'' is not needed: only one version of
x, namely
x2 is reaching this place, so there is no problem (in other words, Φ(
x2,
x2)=
x2).
Given an arbitrary control-flow graph, it can be difficult to tell where to insert Φ functions, and for which variables. This general question has an efficient solution that can be computed using a concept called ''dominance frontiers'' (see below).
Φ functions are not implemented as machine operations on most machines. A compiler can implement a Φ function by inserting "move" operations at the end of every predecessor block. In the example above, the compiler might insert a move from
y1 to
y3 at the end of the middle-left block and a move from
y2 to
y3 at the end of the middle-right block. These move operations might not end up in the final code based on the compiler's
register allocation procedure. However, this approach may not work when simultaneous operations are speculatively producing inputs to a Φ function, as can happen on
wide-issue machines. Typically, a wide-issue machine has a selection instruction used in such situations by the compiler to implement the Φ function.
According to Kenny Zadeck, Φ functions were originally known as ''phony'' functions while SSA was being developed at IBM Research in the 1980s. The formal name of a Φ function was only adopted when the work was first published in an academic paper.
Computing minimal SSA using dominance frontiers
In a control-flow graph, a node A is said to ' a different node B if it is impossible to reach B without passing through A first. In other words, if node B is reached, then it can be assumed that A has run. A is said to ''dominate'' B (or B to ''be dominated by'' A) if either A strictly dominates B or A = B.
A node which transfers control to a node A is called an ' of A.
The ' of node A is the set of nodes B where A strictly dominate B, but does dominate some immediate predecessor of B. These are the points at which multiple control paths merge back together into a single path.
For example, in the following code:
x = random()
if x < 0.5
result = "heads"
else
result = "tails"
end
print(result)
node 1 strictly dominates 2, 3, and 4, and the immediate predecessor of node 4 is nodes 2 and 3.
Dominance frontiers define the points at which Φ functions are needed. In the above example, when control is passed to node 4, the definition of
result
used depends on whether control was passed from node 2 or 3. Φ functions are not needed for variables defined in a dominator, as there is only one possible definition that can apply.
There is an efficient algorithm for finding dominance frontiers of each node. This algorithm was originally described in “Efficiently Computing Static Single Assignment Form and the Control Graph” by Ron Cytron, Jeanne Ferrante, et al in 1991.
Keith D. Cooper, Timothy J. Harvey, and Ken Kennedy of
Rice University describe an algorithm in their paper titled ''A Simple, Fast Dominance Algorithm'':
for each node b
dominance_frontier(b) :=
for each node b
if the number of immediate predecessors of b ≥ 2
for each p in immediate predecessors of b
runner := p
while runner ≠ idom(b)
dominance_frontier(runner) := dominance_frontier(runner) ∪
runner := idom(runner)
In the code above,
idom(b)
is the ' of b, the unique node that strictly dominates b but does not strictly dominate any other node that strictly dominates b.
Variations that reduce the number of Φ functions
"Minimal" SSA inserts the minimal number of Φ functions required to ensure that each name is assigned a value exactly once and that each reference (use) of a name in the original program can still refer to a unique name. (The latter requirement is needed to ensure that the compiler can write down a name for each operand in each operation.)
However, some of these Φ functions could be ''
dead''. For this reason, minimal SSA does not necessarily produce the fewest Φ functions that are needed by a specific procedure. For some types of analysis, these Φ functions are superfluous and can cause the analysis to run less efficiently.
Pruned SSA
Pruned SSA form is based on a simple observation: Φ functions are only needed for variables that are "live" after the Φ function. (Here, "live" means that the value is used along some path that begins at the Φ function in question.) If a variable is not live, the result of the Φ function cannot be used and the assignment by the Φ function is dead.
Construction of pruned SSA form uses
live-variable information in the Φ function insertion phase to decide whether a given Φ function is needed. If the original variable name isn't live at the Φ function insertion point, the Φ function isn't inserted.
Another possibility is to treat pruning as a
dead-code elimination
In compiler theory, dead-code elimination (also known as DCE, dead-code removal, dead-code stripping, or dead-code strip) is a compiler optimization to remove code which does not affect the program results. Removing such code has several benefits: ...
problem. Then, a Φ function is live only if any use in the input program will be rewritten to it, or if it will be used as an argument in another Φ function. When entering SSA form, each use is rewritten to the nearest definition that dominates it. A Φ function will then be considered live as long as it is the nearest definition that dominates at least one use, or at least one argument of a live Φ.
Semi-pruned SSA
Semi-pruned SSA form is an attempt to reduce the number of Φ functions without incurring the relatively high cost of computing live-variable information. It is based on the following observation: if a variable is never live upon entry into a basic block, it never needs a Φ function. During SSA construction, Φ functions for any "block-local" variables are omitted.
Computing the set of block-local variables is a simpler and faster procedure than full live-variable analysis, making semi-pruned SSA form more efficient to compute than pruned SSA form. On the other hand, semi-pruned SSA form will contain more Φ functions.
Block arguments
Block arguments are an alternative to Φ functions that is representationally identical but in practice can be more convenient during optimization. Blocks are named and take a list of block arguments, notated as function parameters. When calling a block the block arguments are bound to specified values.
Swift and LLVM's multi-level intermediate representation use block arguments.
Converting out of SSA form
SSA form is not normally used for direct execution (although it is possible to interpret SSA), and it is frequently used "on top of" another IR with which it remains in direct correspondence. This can be accomplished by "constructing" SSA as a set of functions that map between parts of the existing IR (basic blocks, instructions, operands, ''etc.'') and its SSA counterpart. When the SSA form is no longer needed, these mapping functions may be discarded, leaving only the now-optimized IR.
Performing optimizations on SSA form usually leads to entangled SSA-Webs, meaning there are Φ instructions whose operands do not all have the same root operand. In such cases
color-out algorithms are used to come out of SSA. Naive algorithms introduce a copy along each predecessor path that caused a source of different root symbol to be put in Φ than the destination of Φ. There are multiple algorithms for coming out of SSA with fewer copies, most use interference graphs or some approximation of it to do copy coalescing.
Extensions
Extensions to SSA form can be divided into two categories.
''Renaming scheme'' extensions alter the renaming criterion. Recall that SSA form renames each variable when it is assigned a value. Alternative schemes include static single use form (which renames each variable at each statement when it is used) and static single information form (which renames each variable when it is assigned a value, and at the post-dominance frontier).
''Feature-specific'' extensions retain the single assignment property for variables, but incorporate new semantics to model additional features. Some feature-specific extensions model high-level programming language features like arrays, objects and aliased pointers. Other feature-specific extensions model low-level architectural features like speculation and predication.
Compilers using SSA form
SSA form is a relatively recent development in the compiler community. As such, many older compilers only use SSA form for some part of the compilation or optimization process, but most do not rely on it. Examples of compilers that rely heavily on SSA form include:
* The ETH
Oberon-2
Oberon-2 is an extension of the original Oberon programming language that adds limited reflection and object-oriented programming facilities, open arrays as pointer base types, read-only field export, and reintroduces the FOR loop from Modula-2.
...
compiler was one of the first public projects to incorporate "GSA", a variant of SSA.
* The
LLVM Compiler Infrastructure uses SSA form for all scalar register values (everything except memory) in its primary code representation. SSA form is only eliminated once register allocation occurs, late in the compile process (often at link time).
* The
Open64 compiler uses SSA form in its global scalar optimizer, though the code is brought into SSA form before and taken out of SSA form afterwards. Open64 uses extensions to SSA form to represent memory in SSA form as well as scalar values.
* As of version 4 (released in April 2005) GCC, the
GNU Compiler Collection, makes extensive use of SSA. The
frontends generate "
GENERIC
Generic or generics may refer to:
In business
* Generic term, a common name used for a range or class of similar things not protected by trademark
* Generic brand, a brand for a product that does not have an associated brand or trademark, other ...
" code that is then converted into "
GIMPLE
The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free software ...
" code by the "gimplifier". High-level optimizations are then applied on the SSA form of "GIMPLE". The resulting optimized intermediate code is then translated into
RTL, on which low-level optimizations are applied. The architecture-specific
backends finally turn RTL into
assembly language
In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence be ...
.
*
IBM's open source adaptive
Java virtual machine
A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describes ...
,
Jikes RVM
Jikes is an open-source Java compiler written in C++. It is no longer being updated.
The original version was developed by David L. "Dave" Shields and Philippe Charles at IBM but was quickly transformed into an open-source project contributed ...
, uses extended Array SSA, an extension of SSA that allows analysis of scalars, arrays, and object fields in a unified framework. Extended Array SSA analysis is only enabled at the maximum optimization level, which is applied to the most frequently executed portions of code.
* In 2002
researchers modifiedIBM's JikesRVM (named Jalapeño at the time) to run both standard Java
bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
and a typesafe SSA (
SafeTSA) bytecode class files, and demonstrated significant performance benefits to using the SSA bytecode.
*
Oracle
An oracle is a person or agency considered to provide wise and insightful counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. As such, it is a form of divination.
Description
The word '' ...
's
HotSpot Java Virtual Machine uses an SSA-based intermediate language in its JIT compiler.
* Microsoft
Visual C++ compiler backend available in
Microsoft Visual Studio
Visual Studio is an integrated development environment (IDE) from Microsoft. It is used to develop computer programs including websites, web apps, web services and mobile apps. Visual Studio uses Microsoft software development platforms such a ...
2015 Update 3 uses SSA
*
Mono
Mono may refer to:
Common meanings
* Infectious mononucleosis, "the kissing disease"
* Monaural, monophonic sound reproduction, often shortened to mono
* Mono-, a numerical prefix representing anything single
Music Performers
* Mono (Japanese b ...
uses SSA in its JIT compiler called Mini.
jackccis an open-source compiler for the academic instruction set Jackal 3.0. It uses a simple 3-operand code with SSA for its intermediate representation. As an interesting variant, it replaces Φ functions with a so-called SAME instruction, which instructs the register allocator to place the two live ranges into the same physical register.
* Although not a compiler, th
Boomerangdecompiler uses SSA form in its internal representation. SSA is used to simplify expression propagation, identifying parameters and returns, preservation analysis, and more.
*
Portable.NET uses SSA in its JIT compiler.
libFirma completely graph based SSA intermediate representation for compilers. libFirm uses SSA form for all scalar register values until code generation by use of an SSA-aware register allocator.
* The Illinois Concert Compiler circa 1994 used a variant of SSA called SSU (Static Single Use) which renames each variable when it is assigned a value, and in each conditional context in which that variable is used; essentially the static single information form mentioned above. The SSU form is documented i
John Plevyak's Ph.D Thesis
* The COINS compiler uses SSA form optimizations as explaine
here
* The
Mozilla Firefox SpiderMonkey JavaScript engine uses SSA-based IR.
* The
Chromium
Chromium is a chemical element with the symbol Cr and atomic number 24. It is the first element in group 6. It is a steely-grey, lustrous, hard, and brittle transition metal.
Chromium metal is valued for its high corrosion resistance and hardne ...
V8 JavaScript engine
V8 is a free and open-source JavaScript engine developed by the Chromium Project for Google Chrome and Chromium web browsers. The project’s creator is Lars Bak. The first version of the V8 engine was released at the same time as the first versi ...
implements SSA in its Crankshaft compiler infrastructure a
announced in December 2010*
PyPy uses a linear SSA representation for traces in its JIT compiler.
*
Android
Android may refer to:
Science and technology
* Android (robot), a humanoid robot or synthetic organism designed to imitate a human
* Android (operating system), Google's mobile operating system
** Bugdroid, a Google mascot sometimes referred to ...
's
Dalvik virtual machine uses SSA in its JIT compiler.
*
Android
Android may refer to:
Science and technology
* Android (robot), a humanoid robot or synthetic organism designed to imitate a human
* Android (operating system), Google's mobile operating system
** Bugdroid, a Google mascot sometimes referred to ...
's new optimizing compiler for the
Android Runtime uses SSA for its IR.
* The Standard ML compiler
MLton
MLton is an open-source whole-program optimizing compiler for Standard ML. MLton development began in 1997, and continues with a worldwide community of developers and users, who have helped to port MLton to a number of platforms. MLton was a parti ...
uses SSA in one of its intermediate languages.
*
LuaJIT makes heavy use of SSA-based optimizations.
* The
PHP and
Hack
Hack may refer to:
Arts, entertainment, and media Games
* ''Hack'' (Unix video game), a 1984 roguelike video game
* ''.hack'' (video game series), a series of video games by the multimedia franchise ''.hack''
Music
* ''Hack'' (album), a 199 ...
compiler
HHVM uses SSA in its IR.
* Reservoir Labs' R-Stream compiler supports non-SSA (quad list), SSA and SSI (Static Single Information) forms.
*
Go (1.7: for x86-64 architecture only; 1.8: for all supported architectures).
*
SPIR-V
Standard Portable Intermediate Representation (SPIR) is an intermediate language for parallel compute and graphics by Khronos Group. It is used in multiple execution environments, including the Vulkan graphics API and the OpenCL compute API, to re ...
, the shading language standard for the
Vulkan graphics API and
kernel language for
OpenCL compute API, is an SSA representation.
* Various
Mesa drivers via NIR, an SSA representation for shading languages.
*
WebKit uses SSA in its JIT compilers.
*
Swift defines its own SSA form above LLVM IR, called SIL (Swift Intermediate Language).
*
Erlang rewrote their compiler in OTP 22.0 to "internally use an intermediate representation based on Static Single Assignment (SSA)." With plans for further optimizations built on top of SSA in future releases.
References
Notes
General references
* Also available in
Java (, 2002) and
C (, 1998) versions.
*
*
*
*
*
* {{citation , author1=Matthias Braun , title=Compiler Construction , author2=Sebastian Buchwald , author3=Sebastian Hack , author4=Roland Leißa , author5=Christoph Mallon , author6=Andreas Zwinkau , chapter=Simple and Efficient Construction of Static Single Assignment Form, year=2013, volume=7791, pages=102–122, publisher=Springer Berlin Heidelberg, series=Lecture Notes in Computer Science, doi=10.1007/978-3-642-37051-9_6, chapter-url=http://www.cdl.uni-saarland.de/projects/ssaconstr, accessdate=24 March 2013, isbn=978-3-642-37050-2 , doi-access=free
External links
* Bosscher, Steven; and Novillo, Diego
GCC gets a new Optimizer Framework An article about GCC's use of SSA and how it improves over older IRs.
Extensive catalogue of SSA research papers.
* Zadeck, F. Kenneth
"The Development of Static Single Assignment Form" December 2007 talk on the origins of SSA.
* VV.AA
"SSA-based Compiler Design"(2014)
Compiler optimizations