
A linker or link editor is a
computer program
A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. It is one component of software, which also includes software documentation, documentation and other intangibl ...
that combines intermediate
software build
A software build is the process of converting source code files into standalone artifact (software development), software artifact(s) that can be run on a computer, or the result of doing so.
In software production, builds optimize software for pe ...
files such as
object
Object may refer to:
General meanings
* Object (philosophy), a thing, being, or concept
** Object (abstract), an object which does not exist at any particular time or place
** Physical object, an identifiable collection of matter
* Goal, an a ...
and
library
A library is a collection of Book, books, and possibly other Document, materials and Media (communication), media, that is accessible for use by its members and members of allied institutions. Libraries provide physical (hard copies) or electron ...
files into a single
executable
In computer science, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instruction (computer science), in ...
file such as a program or library. A linker is often part of a
toolchain
A toolchain is a set of software development tools used to build and otherwise develop software. Often, the tools are executed sequentially and form a pipeline such that the output of one tool is the input for the next. Sometimes the term is us ...
that includes a
compiler
In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
and/or
assembler that generates intermediate files that the linker processes. The linker may be integrated with other toolchain
tools
A tool is an object that can extend an individual's ability to modify features of the surrounding environment or help them accomplish a particular task. Although many animals use simple tools, only human beings, whose use of stone tools dates ...
such that the user does not interact with the linker directly.
A simpler version that writes its
output
Output may refer to:
* The information produced by a computer, see Input/output
* An output state of a system, see state (computer science)
* Output (economics), the amount of goods and services produced
** Gross output in economics, the valu ...
directly to
memory
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...
is called the ''loader'', though
loading is typically considered a separate process.
Overview
Computer programs typically are composed of several parts or modules; these parts/modules do not need to be contained within a single
object file
An object file is a file that contains machine code or bytecode, as well as other data and metadata, generated by a compiler or assembler from source code during the compilation or assembly process. The machine code that is generated is kno ...
, and in such cases refer to each other using
symbols
A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise different concep ...
as addresses into other modules, which are mapped into memory addresses when linked for execution.
While the process of linking is meant to ultimately combine these independent parts, there are many good reasons to develop those separately at the
source-level. Among these reasons are the ease of organizing several smaller pieces over a
monolithic
A monolith is a monument or natural feature consisting of a single massive stone or rock.
Monolith or monolithic may also refer to:
Architecture
* Monolithic architecture, a style of construction in which a building is carved, cast or excavated f ...
whole and the ability to better define the purpose and responsibilities of each individual piece, which is essential for managing complexity and increasing long-term maintainability in
software architecture
Software architecture is the set of structures needed to reason about a software system and the discipline of creating such structures and systems. Each structure comprises software elements, relations among them, and properties of both elements a ...
.
Typically, an object file can contain three kinds of symbols:
* defined "external" symbols, sometimes called "public" or "entry" symbols, which allow it to be called by other modules,
* undefined "external" symbols, which reference other modules where these symbols are defined, and
* local symbols, used internally within the object file to facilitate
relocation.
For most compilers, each object file is the result of compiling one input source code file. When a program comprises multiple object files, the linker combines these files into a unified executable program, resolving the symbols as it goes along.
Linkers can take objects from a collection called a
library
A library is a collection of Book, books, and possibly other Document, materials and Media (communication), media, that is accessible for use by its members and members of allied institutions. Libraries provide physical (hard copies) or electron ...
or
runtime library
A runtime library is a library that provides access to the runtime environment that is available to a computer program tailored to the host platform. A runtime environment implements the execution model as required for a development environme ...
. Most linkers do not include all the object files in a
static library
A static library or statically linked library contains functions and data that can be included in a consuming computer program at build-time such that the library does not need to be accessible in a separate file at run-time. If all libraries a ...
in the output executable; they include only those object files from the library that are referenced by other object files or libraries directly or indirectly. But for a
shared library
In computing, a library is a collection of System resource, resources that can be leveraged during software development to implement a computer program. Commonly, a library consists of executable code such as compiled function (computer scienc ...
, the entire library has to be loaded during runtime as it is not known which functions or methods will be called during runtime. Library linking may thus be an iterative process, with some referenced modules requiring additional modules to be linked, and so on. Libraries exist for diverse purposes, and one or more system libraries are usually linked in by default.
The linker also takes care of arranging the objects in a program's
address space
In computing, an address space defines a range of discrete addresses, each of which may correspond to a network host, peripheral device, disk sector, a memory cell or other logical or physical entity.
For software programs to save and retrieve ...
. This may involve ''relocating'' code that assumes a specific
base address
In computing, a base address is an address serving as a reference point ("base") for other addresses. Related addresses can be accessed using an ''addressing scheme''.
Under the ''relative addressing'' scheme, to obtain an absolute address, the ...
into another base. Since a compiler seldom knows where an object will reside, it often assumes a fixed base location (for example,
zero
0 (zero) is a number representing an empty quantity. Adding (or subtracting) 0 to any number leaves that number unchanged; in mathematical terminology, 0 is the additive identity of the integers, rational numbers, real numbers, and compl ...
). Relocating machine code may involve re-targeting absolute jumps, loads, and stores.
The executable output by the linker may need another relocation pass when it is finally loaded into memory (just before execution). This pass is usually omitted on
hardware offering
virtual memory
In computing, virtual memory, or virtual storage, is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a ver ...
: every program is put into its own address space, so there is no conflict even if all programs load at the same base address. This pass may also be omitted if the executable is a
position independent
In computing, position-independent code (PIC) or position-independent executable (PIE) is a body of machine code that executes properly regardless of its memory address. PIC is commonly used for shared libraries, so that the same library cod ...
executable.
Additionally, in some operating systems, the same program handles both the jobs of linking and loading a program (
dynamic linking
In computing, a dynamic linker is the part of an operating system that loads and links the shared libraries needed by an executable when it is executed (at " run time"), by copying the content of libraries from persistent storage to RAM, fill ...
).
Dynamic linking
Many
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
environments allow dynamic linking, deferring the resolution of some undefined symbols until a program is run. That means that the executable code still contains undefined symbols, plus a list of objects or libraries that will provide definitions for these. Loading the program will load these objects/libraries as well, and perform a final linking.
This approach offers two advantages:
* Often-used libraries (for example the standard system libraries) need to be stored in only one location, not duplicated in every single executable file, thus saving limited
memory
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...
and
disk space.
* If a bug in a library function is corrected by replacing the library or
performance
A performance is an act or process of staging or presenting a play, concert, or other form of entertainment. It is also defined as the action or process of carrying out or accomplishing an action, task, or function.
Performance has evolved glo ...
is improved, all programs using it dynamically will benefit from the correction after restarting them. Programs that included this function by static linking would have to be re-linked first.
There are also disadvantages:
* Known on the
Windows
Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
platform as "
DLL hell", an incompatible updated library will break executables that depended on the behavior of the previous version of the library if the newer version is not correctly
backward compatible
In telecommunications and computing, backward compatibility (or backwards compatibility) is a property of an operating system, software, real-world product, or technology that allows for interoperability with an older legacy system, or with inpu ...
.
* A program, together with the libraries it uses, might be certified (e.g. as to correctness, documentation requirements, or performance) as a package, but not if components can be replaced (this also argues against automatic OS updates in critical systems; in both cases, the OS and libraries form part of a ''qualified'' environment).
Contained or
virtual environments may further allow
system administrators to mitigate or trade-off these individual pros and cons.
Static linking
Static linking is the result of the linker copying all library routines used in the program into the executable image. This may require more disk space and memory than dynamic linking, but is more portable, since it does not require the presence of the
library
A library is a collection of Book, books, and possibly other Document, materials and Media (communication), media, that is accessible for use by its members and members of allied institutions. Libraries provide physical (hard copies) or electron ...
on the system where it runs. Static linking also prevents "DLL hell", since each program includes exactly the versions of library routines that it requires, with no conflict with other programs. A program using just a few routines from a library does not require the entire library to be installed.
Relocation
As the compiler has no information on the layout of objects in the final output, it cannot take advantage of shorter or more efficient instructions that place a requirement on the address of another object. For example, a jump instruction can reference an absolute address or an offset from the current location, and the offset could be expressed with different lengths depending on the distance to the target. By first generating the most conservative instruction (usually the largest relative or absolute variant, depending on platform) and adding ''relaxation hints'', it is possible to substitute shorter or more efficient instructions during the final link. In regard to jump optimizations this is also called ''automatic jump-sizing''.
This step can be performed only after all input objects have been read and assigned temporary addresses; the linker relaxation pass subsequently reassigns addresses, which may in turn allow more potential relaxations to occur. In general, the substituted sequences are shorter, which allows this process to always converge on the best solution given a fixed order of objects; if this is not the case, relaxations can conflict, and the linker needs to weigh the advantages of either option.
While instruction relaxation typically occurs at link-time, inner-module relaxation can already take place as part of the optimizing process at
compile-time
In computer science, compile time (or compile-time) describes the time window during which a language's statements are converted into binary instructions for the processor to execute. The term is used as an adjective to describe concepts relat ...
. In some cases, relaxation can also occur at
load-time as part of the relocation process or combined with
dynamic dead-code elimination
In compiler theory, dead-code elimination (DCE, dead-code removal, dead-code stripping, or dead-code strip) is a compiler optimization to remove dead code (code that does not affect the program results). Removing such code has several benefits: i ...
techniques.
Linkage editor
In IBM
System/360
The IBM System/360 (S/360) is a family of mainframe computer systems announced by IBM on April 7, 1964, and delivered between 1965 and 1978. System/360 was the first family of computers designed to cover both commercial and scientific applicati ...
through
IBM Z
IBM Z is a family name used by IBM for all of its z/Architecture mainframe computers.
In July 2017, with another generation of products, the official family was changed to IBM Z from IBM z Systems; the IBM Z family will soon include the newes ...
mainframe
A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for tasks such as censuses, industry and consumer statistics, enterpris ...
operating systems such as
OS/360 and its successors, this type of program is known as a ''linkage editor''. As the name implies a linkage ''editor'' has the additional capability of allowing the addition, replacement, and/or deletion of individual program sections. Operating systems such as OS/360 have format for executable load-modules containing supplementary data about the component sections of a program, so that an individual program section can be replaced, and other parts of the program updated so that relocatable addresses and other references can be corrected by the linkage editor, as part of the process.
One advantage of this is that it allows a program to be maintained without having to keep all of the intermediate object files, or without having to re-compile program sections that haven't changed. It also permits program updates to be distributed in the form of small files (originally
card decks), containing only the object module to be replaced. In such systems, object code is in the form and format of 80-byte punched-card images, so that updates can be introduced into a system using that medium. In later releases of OS/360 and in subsequent systems, load-modules contain additional data about versions of components modules, to create a traceable record of updates. It also allows one to add, change, or remove an
overlay structure from an already linked load module.
The term "linkage editor" should not be construed as implying that the program operates in a user-interactive mode like a text editor. It is intended for batch-mode execution, with the editing commands being supplied by the user in sequentially organized files, such as
punched card
A punched card (also punch card or punched-card) is a stiff paper-based medium used to store digital information via the presence or absence of holes in predefined positions. Developed over the 18th to 20th centuries, punched cards were widel ...
s,
DASD, or
magnetic tape
Magnetic tape is a medium for magnetic storage made of a thin, magnetizable coating on a long, narrow strip of plastic film. It was developed in Germany in 1928, based on the earlier magnetic wire recording from Denmark. Devices that use magnetic ...
.
''Linkage editing'' (
IBM
International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
nomenclature) or ''consolidation'' or ''collection'' (
ICL nomenclature) refers to the ''linkage editor's'' or ''consolidator's'' act of combining the various pieces into a relocatable binary, whereas the loading and relocation into an absolute binary at the target address is normally considered a separate step.
Linker control scripts
In the beginning linkers gave users very limited control over the arrangement of generated output object files. As the target systems became complex with different memory requirements such as embedded systems, it became necessary to give users control to generate output object files with their specific requirements such as defining base addresses' of segments. Linkers control scripts were used for this.
Implementations
Notable implementations:
Unix & Unix-like
On Unix and Unix-like systems, the static linker is usually invoked via the command
ld
which is an abbreviation of ''LoaDer'' or ''Link eDitor''. The term "loader" was used to describe the process of loading external symbols from other programs during the process of linking.
For example, on
SINTRAN III, linking (assembling object files into a program) was called ''loading'' as in loading executable code onto a file.
GNU
GNU ld, part of the
GNU Binary Utilities (binutils), is the
GNU Project
The GNU Project ( ) is a free software, mass collaboration project announced by Richard Stallman on September 27, 1983. Its goal is to give computer users freedom and control in their use of their computers and Computer hardware, computing dev ...
version of the Unix static linker. A ''linker script'' may be passed to GNU ld to exercise fine grain control of the linking process.
Two versions of ld are provided in binutils: the traditional GNU ld based on
bfd, and a streamlined ELF-only version called
gold
Gold is a chemical element; it has chemical symbol Au (from Latin ) and atomic number 79. In its pure form, it is a brightness, bright, slightly orange-yellow, dense, soft, malleable, and ductile metal. Chemically, gold is a transition metal ...
.
The
LLVM
LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...
project's linker, ', is designed to be drop-in compatible, and may be used directly with the GNU compiler. Another drop-in replacement, mold, is a highly parallelized and faster alternative which is also supported by GNU tools.
See also
*
Binary File Descriptor library
The Binary File Descriptor library (BFD) is the GNU Project's main mechanism for the portable manipulation of object files in a variety of formats. , it supports approximately 50 file formats and 25 instruction set architectures.
History
When ...
(libbfd)
*
Build (computing)
A software build is the process of converting source code files into standalone artifact (software development), software artifact(s) that can be run on a computer, or the result of doing so.
In software production, builds optimize software for pe ...
*
Compile and go system
*
DLL hell
*
Direct binding
*
Dynamic binding
*
Dynamic dead code elimination
*
Dynamic dispatch
In computer science, dynamic dispatch is the process of selecting which implementation of a polymorphic operation (method or function) to call at run time. It is commonly employed in, and considered a prime characteristic of, object-oriented ...
*
Dynamic library
A dynamic library is a library that contains functions and data that can be consumed by a computer program at run-time as loaded from a file separate from the program executable. Dynamic linking or late binding allows for using a dynamic libra ...
*
Dynamic linker
In computing, a dynamic linker is the part of an operating system that loads and links the shared libraries needed by an executable when it is executed (at " run time"), by copying the content of libraries from persistent storage to RAM, fill ...
*
Dynamic loading
*
Dynamic-link library
A dynamic-link library (DLL) is a shared library in the Microsoft Windows or OS/2 operating system. A DLL can contain executable code (functions), data, and resources.
A DLL file often has file extension .dll even though this is not required ...
*
External variable
In the C programming language, and its predecessor B, an external variable is a variable defined outside any function block. On the other hand, a local (automatic) variable is a variable defined inside a function block.
Definition, declarati ...
*
Library
A library is a collection of Book, books, and possibly other Document, materials and Media (communication), media, that is accessible for use by its members and members of allied institutions. Libraries provide physical (hard copies) or electron ...
*
Loader
*
Name decoration
In compiler construction, name mangling (also called name decoration) is a technique used to solve various problems caused by the need to resolve unique names for programming entities in many modern programming languages.
It provides means to e ...
*
Prelinking (prebinding)
*
Relocation
*
Smart linking
*
Static library
A static library or statically linked library contains functions and data that can be included in a consuming computer program at build-time such that the library does not need to be accessible in a separate file at run-time. If all libraries a ...
*
Gold (linker)
References
Further reading
*
*
*
* Code
Errata
* (19 pages)
*
External links
Ian Lance Taylor's ''Linkers'' blog entriesLinkers and Loaders a
Linux Journal
''Linux Journal'' (''LJ'') is an American monthly technology magazine originally published by Specialized System Consultants, Inc. (SSC) in Seattle, Washington since 1994. In December 2006 the publisher changed to Belltown Media, Inc. in Hous ...
article by Sandeep Grover
Another Listing of Where to Get a Complete Collection of Free Tools for Assembly Language DevelopmentLLD - The LLVM Linker*
{{Authority control
Compilers
Computer libraries
Programming language implementation
Utility software types