An optimizing compiler is a

compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...

designed to generate code that is optimized in aspects such as minimizing program execution time, memory usage, storage size, and power consumption. Optimization is generally implemented as a sequence of optimizing transformations, a.k.a. compiler optimizations algorithms that transform code to produce semantically equivalent code optimized for some aspect. Optimization is limited by a number of factors. Theoretical analysis indicates that some optimization problems are

NP-complete In computational complexity theory, NP-complete problems are the hardest of the problems to which ''solutions'' can be verified ''quickly''. Somewhat more precisely, a problem is NP-complete when: # It is a decision problem, meaning that for any ...

, or even undecidable. Also, producing perfectly ''optimal'' code is not possible since optimizing for one aspect often degrades performance for another. Optimization is a collection of

heuristic A heuristic or heuristic technique (''problem solving'', '' mental shortcut'', ''rule of thumb'') is any approach to problem solving that employs a pragmatic method that is not fully optimized, perfected, or rationalized, but is nevertheless ...

methods for improving resource usage in typical programs.

Categorization

Local vs. global scope

Scope describes how much of the input code is considered to apply optimizations. Local scope optimizations use information local to a

basic block In compiler construction, a basic block is a straight-line code sequence with no branches in except to the entry and no branches out except at the exit. This restricted form makes a basic block highly amenable to analysis. Compilers usually decom ...

. Since basic blocks contain no control flow statements, these optimizations require minimal analysis, reducing time and storage requirements. However, no information is retained across jumps. Global scope optimizations, also known as intra-procedural optimizations, operate on individual functions. This gives them more information to work with but often makes expensive computations necessary. Worst-case assumptions need to be made when function calls occur or global variables are accessed because little information about them is available.

Peephole optimization

Peephole optimizations are usually performed late in the compilation process after

machine code In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers, machine code is the binaryOn nonb ...

has been generated. This optimization examines a few adjacent instructions (similar to "looking through a peephole" at the code) to see whether they can be replaced by a single instruction or a shorter sequence of instructions. For instance, a multiplication of a value by two might be more efficiently executed by left-shifting the value or by adding the value to itself (this example is also an instance of strength reduction).

Inter-procedural optimization

Interprocedural optimization Interprocedural optimization (IPO) is a collection of compiler techniques used in computer programming to improve performance in programs containing many frequently used Function (computer science), functions of small or medium length. IPO differs ...

s analyze all of a program's source code. The more information available, the more effective the optimizations can be. The information can be used for various optimizations, including function inlining, where a call to a function is replaced by a copy of the function body.

Link-time optimization

Link-time optimization (LTO), or whole-program optimization, is a more general class of interprocedural optimization. During LTO, the compiler has visibility across translation units which allows it to perform more aggressive optimizations like cross-module inlining and devirtualization.

Machine and object code optimization

Machine code optimization involves using an object code optimizer to analyze the program after all machine code has been linked. Techniques such as macro compression, which conserves space by condensing common instruction sequences, become more effective when the entire executable task image is available for analysis. *

Language-independent vs. language-dependent

Most high-level

programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...

s share common programming constructs and abstractions, such as branching constructs (if, switch), looping constructs (for, while), and encapsulation constructs (structures, objects). Thus, similar optimization techniques can be used across languages. However, certain language features make some optimizations difficult. For instance, pointers in C and C++ make array optimization difficult; see alias analysis. However, languages such as

PL/I PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language initially developed by IBM. It is designed for scientific, engineering, business and system programming. It has b ...

that also support pointers implement optimizations for arrays. Conversely, some language features make certain optimizations easier. For example, in some languages, functions are not permitted to have

side effects In medicine, a side effect is an effect of the use of a medicinal drug or other treatment, usually adverse but sometimes beneficial, that is unintended. Herbal and traditional medicines also have side effects. A drug or procedure usually used ...

. Therefore, if a program makes several calls to the same function with the same arguments, the compiler can infer that the function's result only needs to be computed once. In languages where functions are allowed to have side effects, the compiler can restrict such optimization to functions that it can determine have no side effects.

Machine-independent vs. machine-dependent

Many optimizations that operate on abstract programming concepts (loops, objects, structures) are independent of the machine targeted by the compiler, but many of the most effective optimizations are those that best exploit special features of the target platform. Examples are instructions that do several things at once, such as decrement register and branch if not zero. The following is an instance of a local machine-dependent optimization. To set a register to 0, the obvious way is to use the constant '0' in an instruction that sets a register value to a constant. A less obvious way is to XOR a register with itself or subtract it from itself. It is up to the compiler to know which instruction variant to use. On many

RISC In electronics and computer science, a reduced instruction set computer (RISC) is a computer architecture designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a comp ...

machines, both instructions would be equally appropriate, since they would both be the same length and take the same time. On many other

microprocessor A microprocessor is a computer processor (computing), processor for which the data processing logic and control is included on a single integrated circuit (IC), or a small number of ICs. The microprocessor contains the arithmetic, logic, a ...

s such as the

Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...

x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...

family, it turns out that the XOR variant is shorter and probably faster, as there will be no need to decode an immediate operand, nor use the internal "immediate operand register"; the same applies on

IBM System/360 The IBM System/360 (S/360) is a family of mainframe computer systems announced by IBM on April 7, 1964, and delivered between 1965 and 1978. System/360 was the first family of computers designed to cover both commercial and scientific applicati ...

and successors for the subtract variant. A potential problem with this is that XOR or subtract may introduce a data dependency on the previous value of the register, causing a

pipeline A pipeline is a system of Pipe (fluid conveyance), pipes for long-distance transportation of a liquid or gas, typically to a market area for consumption. The latest data from 2014 gives a total of slightly less than of pipeline in 120 countries ...

stall, which occurs when the processor must delay execution of an instruction because it depends on the result of a previous instruction. However, processors often treat the XOR of a register with itself or the subtract of a register from itself as a special case that does not cause stalls.

Factors affecting optimization

;Target machine: Whether particular optimizations can and should be applied may depend on the characteristics of the target machine. Some compilers such as GCC and

Clang Clang () is a compiler front end for the programming languages C, C++, Objective-C, Objective-C++, and the software frameworks OpenMP, OpenCL, RenderScript, CUDA, SYCL, and HIP. It acts as a drop-in replacement for the GNU Compiler ...

parameterize machine-dependent factors so that they can be used to optimize for different machines. ;Target CPU architecture * Number of registers: Registers can be used to optimize for performance.

Local variable In computer science, a local variable is a variable that is given ''local scope''. A local variable reference in the function or block in which it is declared overrides the same variable name in the larger scope. In programming languages with ...

s can be stored in registers instead of the

stack Stack may refer to: Places * Stack Island, an island game reserve in Bass Strait, south-eastern Australia, in Tasmania’s Hunter Island Group * Blue Stack Mountains, in Co. Donegal, Ireland People * Stack (surname) (including a list of people ...

. Temporary/intermediate results can be accessed in registers instead of slower memory. *

vs. CISC: CISC instruction sets often have variable instruction lengths, often have a larger number of possible instructions that can be used, and each instruction could take differing amounts of time. RISC instruction sets attempt to limit the variability in each of these: instruction sets are usually constant in length, with few exceptions, there are usually fewer combinations of registers and memory operations, and the instruction issue rate (the number of instructions completed per time period, usually an integer multiple of the clock cycle) is usually constant in cases where memory latency is not a factor. There may be several ways of carrying out a certain task, with CISC usually offering more alternatives than RISC. Compilers have to know the relative costs among the various instructions and choose the best instruction sequence (see

instruction selection __NOTOC__ In computer science, ''instruction selection'' is the stage of a compiler backend that transforms its middle-level intermediate representation (IR) into a low-level IR. In a typical compiler, instruction selection precedes both instructio ...

). * Pipelines: A pipeline is a CPU broken up into an

assembly line An assembly line, often called ''progressive assembly'', is a manufacturing process where the unfinished product moves in a direct line from workstation to workstation, with parts added in sequence until the final product is completed. By mechan ...

. It allows the use of parts of the CPU for different instructions by breaking up the execution of instructions into various stages: instruction decode, address decode, memory fetch, register fetch, compute, register store, etc. One instruction could be in the register store stage, while another could be in the register fetch stage. Pipeline conflicts occur when an instruction in one stage of the pipeline depends on the result of another instruction ahead of it in the pipeline but not yet completed. Pipeline conflicts can lead to

pipeline stall In the design of instruction pipeline, pipelined computer processors, a pipeline stall is a delay in execution of an instruction set, instruction in order to resolve a hazard (computer architecture), hazard. Details In a standard classic RISC pip ...

s: where the CPU wastes cycles waiting for a conflict to resolve. Compilers can ''schedule'', or reorder, instructions so that pipeline stalls occur less frequently. * Number of functional units: Some CPUs have several ALUs and FPUs that allow them to execute multiple instructions simultaneously. There may be restrictions on which instructions can pair with which other instructions ("pairing" is the simultaneous execution of two or more instructions), and which functional unit can execute which instruction. They also have issues similar to pipeline conflicts. Instructions can be scheduled so that the functional units are fully loaded. ;Machine architecture *

CPU cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, whi ...

size and type (direct mapped, 2-/4-/8-/16-way associative, fully associative): Techniques such as inline expansion and

loop unrolling Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space–time tradeoff. The transformation c ...

may increase the size of the generated code and reduce code locality. The program may slow down drastically if a highly used section of code (like inner loops in various algorithms) no longer fits in the cache as a result of optimizations that increase code size. Also, caches that are not fully associative have higher chances of cache collisions even in an unfilled cache. * Cache/memory transfer rates: These give the compiler an indication of the penalty for cache misses. This is used mainly in specialized applications. ;Intended use *

Debugging In engineering, debugging is the process of finding the Root cause analysis, root cause, workarounds, and possible fixes for bug (engineering), bugs. For software, debugging tactics can involve interactive debugging, control flow analysis, Logf ...

: During development, optimizations are often disabled to speed compilation or to make the executable code easier to debug. Optimizing transformations, particularly those that reorder code, can make it difficult to relate the executable code to the source code. * General-purpose use: Prepackaged software is often expected to run on a variety of machines that may share the same instruction set but have different performance characteristics. The code may not be optimized to any particular machine or may be tuned to work best on the most popular machine while working less optimally on others. * Special-purpose use: If the software is compiled for machines with uniform characteristics, then the compiler can heavily optimize the generated code for those machines. :Notable cases include code designed for parallel and

vector processor In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called ...

s, for which special parallelizing compilers are used. :Firmware for an

embedded system An embedded system is a specialized computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is e ...

can be optimized for the target CPU and memory. System cost or reliability may be more important than the code speed. For example, compilers for embedded software usually offer options that reduce code size at the expense of speed. The code's timing may need to be predictable, rather than as fast as possible, so code caching might be disabled, along with compiler optimizations that require it.

Common themes

Optimization includes the following, sometimes conflicting themes. ;Optimize the common case: The common case may have unique properties that allow a '' fast path'' at the expense of a ''slow path''. If the fast path is taken more often, the result is better overall performance. ;Avoid redundancy: Reuse results that are already computed and store them for later use, instead of recomputing them. ;Less code: Remove unnecessary computations and intermediate values. Less work for the CPU, cache, and memory usually results in faster execution. Alternatively, in

embedded systems An embedded system is a specialized computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is em ...

, less code brings a lower product cost. ;Fewer jumps by using ''straight line code'', also called '' branch-free code'': Less complicated code. Jumps (conditional or unconditional branches) interfere with the prefetching of instructions, thus slowing down code. Using inlining or loop unrolling can reduce branching, at the cost of increasing

binary file A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document files ...

size by the length of the repeated code. This tends to merge several

s into one. ;Locality: Code and data that are accessed closely together in time should be placed close together in memory to increase spatial

locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...

. ;Exploit the memory hierarchy: Accesses to memory are increasingly more expensive for each level of the

memory hierarchy In computer architecture, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and contr ...

, so place the most commonly used items in registers first, then caches, then main memory, before going to disk. ;Parallelize: Reorder operations to allow multiple computations to happen in parallel, either at the instruction, memory, or thread level. ;More precise information is better: The more precise the information the compiler has, the better it can employ any or all of these optimization techniques. ;Runtime metrics can help: Information gathered during a test run can be used in profile-guided optimization. Information gathered at runtime, ideally with minimal overhead, can be used by a JIT compiler to dynamically improve optimization. ;Strength reduction: Replace complex, difficult, or expensive operations with simpler ones. For example, replacing division by a constant with multiplication by its reciprocal, or using induction variable analysis to replace multiplication by a loop index with addition.

Specific techniques

Loop optimizations

''Loop optimization'' acts on the statements that make up a loop, such as a ''for'' loop, for example

loop-invariant code motion In computer programming, loop-invariant code consists of statements or expressions (in an imperative programming, imperative programming language) that can be moved outside the body of a loop without affecting the semantics of the program. Loop-i ...

. Loop optimizations can have a significant impact because many programs spend a large percentage of their time inside loops. Some optimization techniques primarily designed to operate on loops include: ; Induction variable analysis: Roughly, if a variable in a loop is a simple linear function of the index variable, such as j := 4*i + 1, it can be updated appropriately each time the loop variable is changed. This is a strength reduction and also may allow the index variable's definitions to become dead code. This information is also useful for bounds-checking elimination and dependence analysis, among other things. ; Loop fission or loop distribution: Loop fission attempts to break a loop into multiple loops over the same index range with each new loop taking only a part of the original loop's body. This can improve

to both the data being accessed within the loop and the code in the loop's body. ; Loop fusion or loop combining or loop ramming or loop jamming: Another technique that attempts to reduce loop overhead. When two adjacent loops would iterate the same number of times regardless of whether that number is known at compile time, their bodies can be combined as long as they do not refer to each other's data. ; Loop inversion: This technique changes a standard ''while'' loop into a ''do/while'' (also known as ''repeat/until'') loop wrapped in an ''if'' conditional, reducing the number of jumps by two, for cases when the loop is executed. Doing so duplicates the condition check (increasing the size of the code) but is more efficient because jumps usually cause a

. Additionally, if the initial condition is known at compile-time and is known to be side-effect-free, the ''if'' guard can be skipped. ; Loop interchange: These optimizations exchange inner loops with outer loops. When the loop variables index into an array, such a transformation can improve the locality of reference, depending on the array's layout. ;

Loop-invariant code motion In computer programming, loop-invariant code consists of statements or expressions (in an imperative programming, imperative programming language) that can be moved outside the body of a loop without affecting the semantics of the program. Loop-i ...

: If a quantity is computed inside a loop during every iteration, and its value is the same for each iteration, it can vastly improve efficiency to hoist it outside the loop and compute its value just once before the loop begins. This is particularly important with the address-calculation expressions generated by loops over arrays. For correct implementation, this technique must be used with loop inversion, because not all code is safe to be hoisted outside the loop. ; Loop nest optimization: Some pervasive algorithms such as matrix multiplication have very poor cache behavior and excessive memory accesses. Loop nest optimization increases the number of cache hits by operating over small blocks and by using a loop interchange. ;Loop reversal: Loop reversal reverses the order in which values are assigned to the index variable. This is a subtle optimization that can help eliminate dependencies and thus enable other optimizations. Furthermore, on some architectures, loop reversal contributes to smaller code, as when the loop index is being decremented, the condition that needs to be met for the running program to exit the loop is a comparison with zero. This is often a special, parameter-less instruction, unlike a comparison with a number, which needs the number to compare to. Therefore, the amount of bytes needed to store the parameter is saved by using the loop reversal. Additionally, if the comparison number exceeds the size of word of the platform, in standard loop order, multiple instructions would need to be executed to evaluate the comparison, which is not the case with loop reversal. ;

Loop unrolling Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space–time tradeoff. The transformation c ...

: Unrolling duplicates the body of the loop multiple times, to decrease the number of times the loop condition is tested and the number of jumps; tests and jumps can hurt performance by impairing the instruction pipeline. A "fewer jumps" optimization. Completely unrolling a loop eliminates all overhead, but requires that the number of iterations be known at compile time. ; Loop splitting: Loop splitting attempts to simplify a loop or eliminate dependencies by breaking it into multiple loops that have the same bodies but iterate over different contiguous portions of the index range. A useful special case is '' loop peeling'', which can simplify a loop with a problematic first iteration by performing that iteration separately before entering the loop. ; Loop unswitching: Unswitching moves a conditional from inside a loop to outside the loop by duplicating the loop's body inside each of the if and else clauses of the conditional. ; Software pipelining: The loop is restructured in such a way that work done in an iteration is split into several parts and done over several iterations. In a tight loop, this technique hides the latency between loading and using values. ; Automatic parallelization: A loop is converted into multi-threaded or vectorized (or even both) code to use multiple processors simultaneously in a shared-memory multiprocessor (SMP) machine, including multi-core machines.

Prescient store optimizations

Prescient store optimizations allow store operations to occur earlier than would otherwise be permitted in the context of threads and locks. The process needs some way of knowing ahead of time what value will be stored by the assignment that it should have followed. The purpose of this relaxation is to allow compiler optimization to perform certain kinds of code rearrangements that preserve the semantics of properly synchronized programs.

Data-flow optimizations

Data-flow optimizations, based on data-flow analysis, primarily depend on how certain properties of data are propagated by control edges in the control-flow graph. Some of these include: ; Common subexpression elimination: In the expression (a + b) - (a + b)/4, "common subexpression" refers to the duplicated (a + b). Compilers implementing this technique realize that (a + b) will not change, and so only calculate its value once. ; Constant folding and propagation: Replacing expressions consisting of constants (e.g., 3 + 5) with their final value (8) at compile time, rather than doing the calculation in run-time. Used in most modern languages. ; Induction variable recognition and elimination: See discussion above about ''induction variable analysis''. ; Alias classification and pointer analysis: In the presence of pointers, it is difficult to make any optimizations at all, since potentially any variable can have been changed when a memory location is assigned to. By specifying which pointers can alias which variables, unrelated pointers can be ignored. ; Dead-store elimination: Removal of assignments to variables that are not subsequently read, either because the lifetime of the variable ends or because of a subsequent assignment that will overwrite the first value.

SSA-based optimizations

These optimizations are intended to be done after transforming the program into a special form called Static Single Assignment, in which every variable is assigned in only one place. Although some function without SSA, they are most effective with SSA. Many optimizations listed in other sections also benefit with no special changes, such as register allocation. ; Global value numbering: GVN eliminates redundancy by constructing a value graph of the program, and then determining which values are computed by equivalent expressions. GVN can identify some redundancy that common subexpression elimination cannot, and vice versa. ; Sparse conditional constant propagation: Combines constant propagation, constant folding, and dead-code elimination, and improves upon what is possible by running them separately. This optimization symbolically executes the program, simultaneously propagating constant values and eliminating portions of the control-flow graph that this makes unreachable.

Code generator optimizations

;

Register allocation In compiler optimization, register allocation is the process of assigning local automatic variables and Expression (computer science), expression results to a limited number of processor registers. Register allocation can happen over a basic bloc ...

: The most frequently used variables should be kept in processor registers for the fastest access. To find which variables to put in registers, an interference-graph is created. Each variable is a vertex and when two variables are used at the same time (have an intersecting liverange) they have an edge between them. This graph is colored using for example Chaitin's algorithm using the same number of colors as there are registers. If the coloring fails one variable is "spilled" to memory and the coloring is retried. ;

Instruction selection __NOTOC__ In computer science, ''instruction selection'' is the stage of a compiler backend that transforms its middle-level intermediate representation (IR) into a low-level IR. In a typical compiler, instruction selection precedes both instructio ...

: Most architectures, particularly CISC architectures and those with many

addressing mode Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions ...

s, offer several different ways of performing a particular operation, using entirely different sequences of instructions. The job of the instruction selector is to do a good job overall of choosing which instructions to implement which operators in the low-level

intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...

with. For example, on many processors in the 68000 family and the x86 architecture, complex addressing modes can be used in statements like lea 25(a1,d5*4), a0, allowing a single instruction to perform a significant amount of arithmetic with less storage. ; Instruction scheduling: Instruction scheduling is an important optimization for modern pipelined processors, which avoids stalls or bubbles in the pipeline by clustering instructions with no dependencies together, while being careful to preserve the original semantics. ; Rematerialization: Rematerialization recalculates a value instead of loading it from memory, eliminating an access to memory. This is performed in tandem with register allocation to avoid spills. ;Code factoring: If several sequences of code are identical, or can be parameterized or reordered to be identical, they can be replaced with calls to a shared subroutine. This can often share code for subroutine set-up and sometimes tail-recursion.Cx51 Compiler Manual, version 09.2001, p. 155, Keil Software Incorporated. ; Trampolines: Many CPUs have smaller subroutine call instructions to access low memory. A compiler can save space by using these small calls in the main body of code. Jump instructions in low memory can access the routines at any address. This multiplies space savings from code factoring. ;Reordering computations: Based on

integer linear programming An integer programming problem is a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers. In many settings the term refers to integer linear programming (ILP), in which the objective ...

, restructuring compilers enhance data locality and expose more parallelism by reordering computations. Space-optimizing compilers may reorder code to lengthen sequences that can be factored into subroutines.

Functional language optimizations

Although many of these also apply to non-functional languages, they either originate in or are particularly critical in functional languages such as

Lisp Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation. Originally specified in the late 1950s, ...

and ML. ; Tail-call optimization: A function call consumes stack space and involves some overhead related to parameter passing and flushing the instruction cache. Tail-recursive algorithms can be converted to

iteration Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration. ...

through a process called tail-recursion elimination or tail-call optimization. ;

Deforestation Deforestation or forest clearance is the removal and destruction of a forest or stand of trees from land that is then converted to non-forest use. Deforestation can involve conversion of forest land to farms, ranches, or urban use. Ab ...

(

data structure In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...

fusion): In languages where it is common for a sequence of transformations to be applied to a list, deforestation attempts to remove the construction of intermediate data structures. ; Partial evaluation: Computations that produce the same output regardless of the dynamic input at runtime can be evaluated at compile time.

Other optimizations

; Bounds-checking elimination: Many languages, such as

Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...

, enforce

bounds checking In computer programming, bounds checking is any method of detecting whether a variable is within some bounds before it is used. It is usually used to ensure that a number fits into a given type (range checking), or that a variable being used as ...

of all array accesses. This is a severe performance

bottleneck Bottleneck may refer to: * the narrowed portion (neck) of a bottle Science and technology * Bottleneck (engineering), where the performance of an entire system is limited by a single component * Bottleneck (network), in a communication network * ...

on certain applications such as scientific code. Bounds-checking elimination allows the compiler to safely remove bounds checking in many situations where it can determine that the index must fall within valid bounds; for example, if it is a simple loop variable. ;Branch-offset optimization (machine dependent): Choose the shortest branch displacement that reaches the target. ;Code-block reordering: Code-block reordering alters the order of the basic blocks in a program to reduce conditional branches and improve the locality of reference. ; Dead-code elimination: Removes instructions that will not affect the behaviour of the program, for example, definitions that have no uses, called dead code. This reduces code size and eliminates unnecessary computation. ;Factoring out of invariants (

loop invariant In computer science, a loop invariant is a property of a program loop that is true before (and after) each iteration. It is a logical assertion, sometimes checked with a code assertion. Knowing its invariant(s) is essential in understanding th ...

s): If an expression is carried out both when a condition is met and is not met, it can be written just once outside of the conditional statement. Similarly, if certain types of expressions (e.g., the assignment of a constant into a variable) appear inside a loop, they can be moved out of it because their effect will be the same no matter if they're executed many times or just once. This is also known as total redundancy elimination. A similar but more powerful optimization is partial-redundancy elimination (PRE). ; Inline expansion or macro expansion: When some code invokes a procedure, it is possible to directly insert the body of the procedure inside the calling code rather than transferring control to it. This saves the overhead related to procedure calls, as well as providing an opportunity for many different parameter-specific optimizations, but comes at the cost of space; the procedure body is duplicated each time the procedure is called inline. Generally, inlining is useful in performance-critical code that makes a large number of calls to small procedures. This is a "fewer jumps" optimization. The statements of

imperative programming In computer science, imperative programming is a programming paradigm of software that uses Statement (computer science), statements that change a program's state (computer science), state. In much the same way that the imperative mood in natural ...

languages are also an example of such an optimization. Although statements could be implemented with function calls they are almost always implemented with code inlining. ; Jump threading: In this optimization, consecutive conditional jumps predicated entirely or partially on the same condition are merged. : E.g., to , :and to . ;Macro compression: A space optimization that recognizes common sequences of code, creates subprograms ("code macros") that contain the common code, and replaces the occurrences of the common code sequences with calls to the corresponding subprograms. This is most effectively done as a

optimization, when all the code is present. The technique was first used to conserve space in an interpretive

byte stream A bitstream (or bit stream), also known as binary sequence, is a sequence of bits. A bytestream is a sequence of bytes. Typically, each byte is an Octet (computing), 8-bit quantity, and so the term octet stream is sometimes used interchangeab ...

used in an implementation of Macro Spitbol on microcomputers. The problem of determining an optimal set of macros that minimizes the space required by a given code segment is known to be

, but efficient heuristics attain near-optimal results. ;Reduction of cache collisions: (e.g., by disrupting alignment within a page) ;

Stack Stack may refer to: Places * Stack Island, an island game reserve in Bass Strait, south-eastern Australia, in Tasmania’s Hunter Island Group * Blue Stack Mountains, in Co. Donegal, Ireland People * Stack (surname) (including a list of people ...

-height reduction: Rearrange an expression tree to minimize resources needed for expression evaluation. ;Test reordering: If we have two tests that are the condition for something, we can first deal with the simpler tests (e.g., comparing a variable to something) and only then with the complex tests (e.g., those that require a function call). This technique complements

lazy evaluation In programming language theory, lazy evaluation, or call-by-need, is an evaluation strategy which delays the evaluation of an Expression (computer science), expression until its value is needed (non-strict evaluation) and which avoids repeated eva ...

, but can be used only when the tests are not dependent on one another. Short-circuiting semantics can make this difficult.

Interprocedural optimizations

works on the entire program, across procedure and file boundaries. It works tightly with intraprocedural counterparts, carried out with the cooperation of a local part and a global part. Typical interprocedural optimizations are procedure inlining, interprocedural dead-code elimination, interprocedural constant propagation, and procedure reordering. As usual, the compiler needs to perform interprocedural analysis before its actual optimizations. Interprocedural analyses include alias analysis, array access analysis, and the construction of a

call graph A call graph (also known as a call multigraph) is a control-flow graph, which represents calling relationships between subroutines in a computer program. Each node represents a procedure and each edge ''(f, g)'' indicates that procedure ''f'' c ...

. Interprocedural optimization is common in modern commercial compilers from SGI,

Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...

, and

Sun Microsystems Sun Microsystems, Inc., often known as Sun for short, was an American technology company that existed from 1982 to 2010 which developed and sold computers, computer components, software, and information technology services. Sun contributed sig ...

. For a long time, the open source GCC was criticized for a lack of powerful interprocedural analysis and optimizations, though this is now improving. Another open-source compiler with full analysis and optimization infrastructure is Open64. Due to the extra time and space required by interprocedural analysis, most compilers do not perform it by default. Users must use compiler options explicitly to tell the compiler to enable interprocedural analysis and other expensive optimizations.

Practical considerations

There can be a wide range of optimizations that a compiler can perform, ranging from simple and straightforward optimizations that take little compilation time to elaborate and complex optimizations that involve considerable amounts of compilation time. Accordingly, compilers often provide options to their control command or procedure to allow the compiler user to choose how much optimization to request; for instance, the IBM FORTRAN H compiler allowed the user to specify no optimization, optimization at the registers level only, or full optimization. By the 2000s, it was common for compilers, such as

, to have several compiler command options that could affect a variety of optimization choices, starting with the familiar -O2 switch. An approach to isolating optimization is the use of so-called post-pass optimizers (some commercial versions of which date back to mainframe software of the late 1970s). These tools take the executable output by an optimizing compiler and optimize it even further. Post-pass optimizers usually work on the

assembly language In computing, assembly language (alternatively assembler language or symbolic machine code), often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence bet ...

level (in contrast with compilers that optimize intermediate representations of programs). One such example is the

Portable C Compiler The Portable C Compiler (also known as pcc or sometimes pccm - portable C compiler machine) is an early compiler for the C programming language written by Stephen C. Johnson of Bell Labs in the mid-1970s, based in part on ideas proposed by Alan ...

(PCC) of the 1980s, which had an optional pass that would perform post-optimizations on the generated assembly code. Another consideration is that optimization algorithms are complicated and, especially when being used to compile large, complex programming languages, can contain bugs that introduce errors in the generated code or cause internal errors during compilation. Compiler errors of any kind can be disconcerting to the user, but especially so in this case, since it may not be clear that the optimization logic is at fault. In the case of internal errors, the problem can be partially ameliorated by a "fail-safe" programming technique in which the optimization logic in the compiler is coded such that a failure is trapped, a warning message issued, and the rest of the compilation proceeds to successful completion.

History

Early compilers of the 1960s were often primarily concerned with simply compiling code correctly or efficiently, such that compile times were a major concern. One notable early optimizing compiler was the IBM FORTRAN H compiler of the late 1960s. Another of the earliest and important optimizing compilers, that pioneered several advanced techniques, was that for

BLISS BLISS is a system programming language developed at Carnegie Mellon University (CMU) by W. A. Wulf, D. B. Russell, and A. N. Habermann around 1970. It was perhaps the best known system language until C debuted a few years later. Since then, C ...

(1970), which was described in ''

The Design of an Optimizing Compiler ''The'' is a grammatical article in English, denoting nouns that are already or about to be mentioned, under discussion, implied or otherwise presumed familiar to listeners, readers, or speakers. It is the definite article in English. ''The ...

'' (1975). By the late 1980s, optimizing compilers were sufficiently effective that programming in assembly language declined. This co-evolved with the development of RISC chips and advanced processor features such as

superscalar processor A superscalar processor (or multiple-issue processor) is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single ins ...

out-of-order execution In computer engineering, out-of-order execution (or more formally dynamic execution) is an instruction scheduling paradigm used in high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In t ...

, and

speculative execution Speculative execution is an optimization (computer science), optimization technique where a computer system performs some task that may not be needed. Work is done before it is known whether it is actually needed, so as to prevent a delay that woul ...

, which were designed to be targeted by optimizing compilers rather than by human-written assembly code.

List of static code analyses

* Alias analysis * Pointer analysis * Shape analysis * Escape analysis * Array-access analysis * Dependence analysis *

Control-flow analysis In computer science, control-flow analysis (CFA) is a static code analysis, static-code-analysis technique for determining the control flow of a program. The control flow is expressed as a control-flow graph (CFG). For both functional programming ...

* Data-flow analysis ** Use-define chain analysis ** Live-variable analysis ** Available expression analysis

References

External links

Optimization manuals
by Agner Fog documentation about x86 processor architecture and low-level code optimization {{DEFAULTSORT:Optimizing Compiler Compiler construction Programming language implementation Compiler theory

Categorization

Local vs. global scope

Peephole optimization

Inter-procedural optimization

Link-time optimization

Machine and object code optimization

Language-independent vs. language-dependent

Machine-independent vs. machine-dependent

Factors affecting optimization

Common themes

Specific techniques

Loop optimizations

Prescient store optimizations

Data-flow optimizations

SSA-based optimizations

Code generator optimizations

Functional language optimizations

Other optimizations

Interprocedural optimizations

Practical considerations

History

List of static code analyses

See also

References

External links