position-independent code
   HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, hardware and softw ...
, position-independent code (PIC) or position-independent executable (PIE) is a body of
machine code In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers, machine code is the binaryOn nonb ...
that executes properly regardless of its
memory address In computing, a memory address is a reference to a specific memory location in memory used by both software and hardware. These addresses are fixed-length sequences of digits, typically displayed and handled as unsigned integers. This numeric ...
. PIC is commonly used for
shared libraries In computing, a library is a collection of resources that can be leveraged during software development to implement a computer program. Commonly, a library consists of executable code such as compiled functions and classes, or a library can ...
, so that the same library code can be loaded at a location in each program's address space where it does not overlap with other memory in use by, for example, other shared libraries. PIC was also used on older computer systems that lacked an MMU, so that the
operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...
could keep applications away from each other even within the single
address space In computing, an address space defines a range of discrete addresses, each of which may correspond to a network host, peripheral device, disk sector, a memory cell or other logical or physical entity. For software programs to save and retrieve ...
of an MMU-less system. Position-independent code can be executed at any memory address without modification. This differs from absolute code, which must be loaded at a specific location to function correctly, and load-time locatable (LTL) code, in which a
linker Linker or linkers may refer to: Computing * Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
or program loader modifies a program before execution, so it can be run only from a particular memory location. The latter terms are sometimes referred to as ''position-dependent code''. Generating position-independent code is often the default behavior for
compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
s, but they may place restrictions on the use of some language features, such as disallowing use of absolute addresses (position-independent code has to use relative addressing). Instructions that refer directly to specific memory addresses sometimes execute faster, and replacing them with equivalent relative-addressing instructions may result in slightly slower execution, although modern processors make the difference practically negligible.


History

In early computers such as the
IBM 701 The IBM 701 Electronic Data Processing Machine, known as the Defense Calculator while in development, was IBM’s first commercial scientific computer and its first series production mainframe computer, which was announced to the public on May 2 ...
(29 April 1952) or the
UNIVAC I The UNIVAC I (Universal Automatic Computer I) was the first general-purpose electronic digital computer design for business application produced in the United States. It was designed principally by J. Presper Eckert and John Mauchly, the invento ...
(31 March 1951) code was not position-independent: each program was built to load into and run from a particular address. Those early computers did not have an operating system and were not multitasking-capable. Programs were loaded into main storage (or even stored on magnetic drum for execution directly from there) and run one at a time. In such an operational context, position-independent code was not necessary. Even on base and bounds systems such as the
CDC 6600 The CDC 6600 was the flagship of the 6000 series of mainframe computer systems manufactured by Control Data Corporation. Generally considered to be the first successful supercomputer, it outperformed the industry's prior recordholder, the I ...
, the GE 625 and the
UNIVAC 1107 The UNIVAC 1100/2200 series is a series of compatible 36-bit computer systems, beginning with the UNIVAC 1107 in 1962, initially made by Sperry Rand. The series continues to be supported today by Unisys Corporation as the ClearPath Dorado Serie ...
, once the OS loaded code into a job's storage, it could only run from the relative address at which it was loaded. Burroughs introduced a segmented system, the
B5000 The Burroughs Large Systems Group produced a family of large 48-bit mainframes using stack machine instruction sets with dense syllables.E.g., 12-bit syllables for B5000, 8-bit syllables for B6500 The first machine in the family was the B5000 i ...
(1961), in which programs addressed segments indirectly via control words on the
stack Stack may refer to: Places * Stack Island, an island game reserve in Bass Strait, south-eastern Australia, in Tasmania’s Hunter Island Group * Blue Stack Mountains, in Co. Donegal, Ireland People * Stack (surname) (including a list of people ...
or in the program reference table (PRT); a shared segment could be addressed via different PRT locations in different processes. Similarly, on the later
B6500 The Burroughs Large Systems Group produced a family of large 48-bit computing, 48-bit mainframe computer, mainframes using stack machine instruction sets with dense Syllable (computing), syllables.E.g., 12-bit syllables for B5000, 8-bit syllables f ...
, all segment references were via positions in a
stack frame In computer science, a call stack is a stack data structure that stores information about the active subroutines and inline blocks of a computer program. This type of stack is also known as an execution stack, program stack, control stack, run- ...
. The
IBM System/360 The IBM System/360 (S/360) is a family of mainframe computer systems announced by IBM on April 7, 1964, and delivered between 1965 and 1978. System/360 was the first family of computers designed to cover both commercial and scientific applicati ...
(7 April 1964) was designed with truncated addressing similar to that of the
UNIVAC III The UNIVAC III, designed as an improved transistorized replacement for the vacuum tube UNIVAC I and UNIVAC II computers. The project was started by the Philadelphia division of Remington Rand UNIVAC in 1958 with the initial announcement of the s ...
, with code position independence in mind. In truncated addressing, memory addresses are calculated from a ''base register'' and an offset. At the beginning of a program, the programmer must establish ''addressability'' by loading a base register; normally, the programmer also informs the assembler with a ''USING'' pseudo-op. The programmer can load the base register from a register known to contain the entry point address, typically R15, or can use th
BALR (Branch And Link, Register form)
instruction (with a R2 Value of 0) to store the next sequential instruction's address into the base register, which was then coded explicitly or implicitly in each instruction that referred to a storage location within the program. Multiple base registers could be used, for code or for data. Such instructions require less memory because they do not have to hold a full 24, 31, 32, or 64 bit address (4 or 8 bytes), but instead a base register number (encoded in 4 bits) and a 12–bit address offset (encoded in 12 bits), requiring only two bytes. This programming technique is standard on IBM S/360 type systems. It has been in use through to today's IBM System/z. When coding in assembly language, the programmer has to establish addressability for the program as described above and also use other base registers for dynamically allocated storage. Compilers automatically take care of this kind of addressing. IBM's early operating system
DOS/360 Disk Operating System/360, also DOS/360, or simply DOS, is the discontinued first member of a sequence of operating systems for IBM System/360, System/370 and later mainframes. It was announced by IBM on the last day of 1964, and it was first d ...
(1966) was not using virtual storage (since the early models of System S/360 did not support it), but it did have the ability to place programs to an arbitrary (or automatically chosen) storage location during loading via the PHASE name,* JCL (Job Control Language) statement. So, on S/360 systems without virtual storage, a program could be loaded at any storage location, but this required a contiguous memory area large enough to hold that program. Sometimes
memory fragmentation In computer storage, fragmentation is a phenomenon in the computer system which involves the distribution of data in to smaller pieces which storage space, such as computer memory or a hard drive, is used inefficiently, reducing capacity or perfo ...
would occur from loading and unloading differently sized modules. Virtual storage - by design - does not have that limitation. While DOS/360 and
OS/360 OS/360, officially known as IBM System/360 Operating System, is a discontinued batch processing operating system developed by IBM for their then-new System/360 mainframe computer, announced in 1964; it was influenced by the earlier IBSYS/IBJOB a ...
did not support PIC, transient SVC routines in OS/360 could not contain relocatable address constants and could run in any of the transient areas without relocation. IBM first introduced virtual storage on
IBM System/360 model 67 IBM mainframes are large computer systems produced by IBM since 1952. During the 1960s and 1970s, IBM dominated the computer market with the 7000 series and the later System/360, followed by the System/370. Current mainframe computers in IBM' ...
in (1965) to support IBM's first multi-tasking operating and time-sharing operating system TSS/360. Later versions of DOS/360 (DOS/VS etc.) and later IBM operating systems all utilized virtual storage. Truncated addressing remained as part of the base architecture, and still advantageous when multiple modules must be loaded into the same virtual address space. By way of comparison, on early segmented systems such as
Burroughs MCP The MCP (Master Control Program) is the operating system of the Burroughs B5000/B5500/B5700 and the B6500 and successors, including the Unisys Clearpath/MCP systems. MCP was originally written in 1961 in ESPOL (Executive Systems Problem Ori ...
on the Burroughs B5000 (1961) and
Multics Multics ("MULTiplexed Information and Computing Service") is an influential early time-sharing operating system based on the concept of a single-level memory.Dennis M. Ritchie, "The Evolution of the Unix Time-sharing System", Communications of t ...
(1964), and on paging systems such as IBM TSS/360 (1967), code was also inherently position-independent, since subroutine virtual addresses in a program were located in private data external to the code, e.g., program reference table, linkage segment, prototype section. The invention of dynamic address translation (the function provided by an MMU) originally reduced the need for position-independent code because every process could have its own independent
address space In computing, an address space defines a range of discrete addresses, each of which may correspond to a network host, peripheral device, disk sector, a memory cell or other logical or physical entity. For software programs to save and retrieve ...
(range of addresses). However, multiple simultaneous jobs using the same code created a waste of physical memory. If two jobs run entirely identical programs, dynamic address translation provides a solution by allowing the system simply to map two different jobs' address 32K to the same bytes of real memory, containing the single copy of the program. Different programs may share common code. For example, the payroll program and the accounts receivable program may both contain an identical sort subroutine. A shared module (a shared library is a form of shared module) gets loaded once and mapped into the two address spaces.


SunOS 4.x and ELF

Procedure calls inside a shared library are typically made through small procedure linkage table (PLT)
stub Stub or Stubb may refer to: Shortened objects and entities * Stub, a tree cut and allowed to regrow from the trunk; see pollarding * Pay stub, a receipt or record that the employer has paid an employee * Stub period, period of time over which i ...
s, which then call the definitive function. This notably allows a shared library to inherit certain function calls from previously loaded libraries rather than using its own versions. Data references from position-independent code are usually made indirectly, through Global Offset Tables (GOTs), which store the addresses of all accessed
global variable In computer programming, a global variable is a variable with global scope, meaning that it is visible (hence accessible) throughout the program, unless shadowed. The set of all global variables is known as the ''global environment'' or ''global ...
s. There is one GOT per compilation unit or object module, and it is located at a fixed offset from the code (although this offset is not known until the library is linked). When a
linker Linker or linkers may refer to: Computing * Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
links modules to create a shared library, it merges the GOTs and sets the final offsets in code. It is not necessary to adjust the offsets when loading the shared library later. Position-independent code that accesses global data does so by fetching the address for the global variable from its entry in the GOT. As the GOT is at a fixed offset from the code, the offset between the address of a given instruction in the code and the address of a GOT entry for a given global variable is also fixed, so that the offset does not need to be changed depending on the address at which the position-independent code is loaded. An instruction that fetches the GOT entry for a global variable would use an
addressing mode Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions ...
that contains an offset relative to some instruction in the code; this might be a PC-relative addressing mode if the
instruction set architecture In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, ...
supports it, or a register-relative addressing mode, with functions loading that register with the address of an instruction in the function prologue.


Windows DLLs

Dynamic-link libraries (DLLs) in
Microsoft Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
use variant E8 of the CALL instruction (Call near, relative, displacement relative to next instruction). These instructions do not need modification when the DLL is loaded. Some global variables (e.g. arrays of string literals, virtual function tables) are expected to contain an address of an object in data section respectively in code section of the dynamic library; therefore, the stored address in the global variable must be updated to reflect the address where the DLL was loaded to. The dynamic loader calculates the address referred to by a global variable and stores the value in such global variable; this triggers copy-on-write of a memory page containing such global variable. Pages with code and pages with global variables that do not contain pointers to code or global data remain shared between processes. This operation must be done in any OS that can load a dynamic library at arbitrary address. In Windows Vista and later versions of Windows, the relocation of DLLs and executables is done by the kernel memory manager, which shares the relocated binaries across multiple processes. Images are always relocated from their preferred base addresses, achieving
address space layout randomization Address space layout randomization (ASLR) is a computer security technique involved in preventing exploitation of memory corruption vulnerabilities. In order to prevent an attacker from reliably redirecting code execution to, for example, a pa ...
(ASLR). Versions of Windows prior to Vista require that system DLLs be prelinked at non-conflicting fixed addresses at the link time in order to avoid runtime relocation of images. Runtime relocation in these older versions of Windows is performed by the DLL loader within the context of each process, and the resulting relocated portions of each image can no longer be shared between processes. The handling of DLLs in Windows differs from the earlier
OS/2 OS/2 is a Proprietary software, proprietary computer operating system for x86 and PowerPC based personal computers. It was created and initially developed jointly by IBM and Microsoft, under the leadership of IBM software designer Ed Iacobucci, ...
procedure it derives from. OS/2 presents a third alternative and attempts to load DLLs that are not position-independent into a dedicated "shared arena" in memory, and maps them once they are loaded. All users of the DLL are able to use the same in-memory copy.


Multics

In
Multics Multics ("MULTiplexed Information and Computing Service") is an influential early time-sharing operating system based on the concept of a single-level memory.Dennis M. Ritchie, "The Evolution of the Unix Time-sharing System", Communications of t ...
each procedure conceptually has a code segment and a linkage segment. The code segment contains only code and the linkage section serves as a template for a new linkage segment. Pointer register 4 (PR4) points to the linkage segment of the procedure. A call to a procedure saves PR4 in the stack before loading it with a pointer to the callee's linkage segment. The procedure call uses an indirect pointer pair with a flag to cause a trap on the first call so that the dynamic linkage mechanism can add the new procedure and its linkage segment to the Known Segment Table (KST), construct a new linkage segment, put their segment numbers in the caller's linkage section and reset the flag in the indirect pointer pair.


TSS

In IBM S/360 Time Sharing System (TSS/360 and TSS/370) each procedure may have a read-only public CSECT and a writable private Prototype Section (PSECT). A caller loads a V-constant for the routine into General Register 15 (GR15) and copies an R-constant for the routine's PSECT into the 19th word of the save area pointed to be GR13. The Dynamic Loader does not load program pages or resolve address constants until the first page fault.


Position-independent executables

''Position-independent executables'' (PIE) are executable binaries made entirely from position-independent code. While some systems only run PIC executables, there are other reasons they are used. PIE binaries are used in some security-focused
Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
distributions to allow PaX or Exec Shield to use
address space layout randomization Address space layout randomization (ASLR) is a computer security technique involved in preventing exploitation of memory corruption vulnerabilities. In order to prevent an attacker from reliably redirecting code execution to, for example, a pa ...
(ASLR) to prevent attackers from knowing where existing executable code is during a security attack using exploits that rely on knowing the offset of the executable code in the binary, such as return-to-libc attacks. (The official Linux kernel since 2.6.12 of 2005 has a weaker ASLR that also works with PIE. It is weak in that randomness is applied to whole ELF file units.) Apple's
macOS macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
and
iOS Ios, Io or Nio (, ; ; locally Nios, Νιός) is a Greek island in the Cyclades group in the Aegean Sea. Ios is a hilly island with cliffs down to the sea on most sides. It is situated halfway between Naxos and Santorini. It is about long an ...
fully support PIE executables as of versions 10.7 and 4.3, respectively; a warning is issued when non-PIE iOS executables are submitted for approval to Apple's App Store but there's no hard requirement yet and non-PIE applications are not rejected.
OpenBSD OpenBSD is a security-focused operating system, security-focused, free software, Unix-like operating system based on the Berkeley Software Distribution (BSD). Theo de Raadt created OpenBSD in 1995 by fork (software development), forking NetBSD ...
has PIE enabled by default on most architectures since OpenBSD 5.3, released on 1 May 2013. Support for PIE in statically linked binaries, such as the executables in /bin and /sbin directories, was added near the end of 2014. openSUSE added PIE as a default in 2015-02. Beginning with Fedora 23, Fedora maintainers decided to build packages with PIE enabled as the default.
Ubuntu Ubuntu ( ) is a Linux distribution based on Debian and composed primarily of free and open-source software. Developed by the British company Canonical (company), Canonical and a community of contributors under a Meritocracy, meritocratic gover ...
17.10 has PIE enabled by default across all architectures. Gentoo's new profiles now support PIE by default. Around July 2017,
Debian Debian () is a free and open-source software, free and open source Linux distribution, developed by the Debian Project, which was established by Ian Murdock in August 1993. Debian is one of the oldest operating systems based on the Linux kerne ...
enabled PIE by default. Android enabled support for PIEs in Jelly Bean and removed non-PIE linker support in Lollipop.


See also

*
Dynamic linker In computing, a dynamic linker is the part of an operating system that loads and links the shared libraries needed by an executable when it is executed (at " run time"), by copying the content of libraries from persistent storage to RAM, fill ...
*
Object file An object file is a file that contains machine code or bytecode, as well as other data and metadata, generated by a compiler or assembler from source code during the compilation or assembly process. The machine code that is generated is kno ...
* Code segment


Notes


References

* *


External links


Introduction to Position Independent Code

Position Independent Code internals



The Curious Case of Position Independent Executables
{{application binary interface Operating system technology Computer libraries Computer file formats