HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, position-independent code (PIC) or position-independent executable (PIE) is a body of
machine code In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ve ...
that, being placed somewhere in the primary memory, executes properly regardless of its absolute address. PIC is commonly used for
shared libraries In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development. These may include configuration data, documentation, help data, message templates, pre-written code and ...
, so that the same library code can be loaded in a location in each program address space where it does not overlap with other memory in use (for example, other shared libraries). PIC was also used on older computer systems that lacked an MMU, so that the
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
could keep applications away from each other even within the single
address space In computing, an address space defines a range of discrete addresses, each of which may correspond to a network host, peripheral device, disk sector, a memory cell or other logical or physical entity. For software programs to save and retrieve s ...
of an MMU-less system. Position-independent code can be executed at any memory address without modification. This differs from
absolute code In computing, a memory address is a reference to a specific memory location used at various levels by software and hardware. Memory addresses are fixed-length sequences of digits conventionally displayed and manipulated as unsigned integers. S ...
, which must be loaded at a specific location to function correctly, and load-time locatable (LTL) code, in which a linker or program loader modifies a program before execution so it can be run only from a particular memory location. Generating position-independent code is often the default behavior for
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
s, but they may place restrictions on the use of some language features, such as disallowing use of absolute addresses (position-independent code has to use relative addressing). Instructions that refer directly to specific memory addresses sometimes execute faster, and replacing them with equivalent relative-addressing instructions may result in slightly slower execution, although modern processors make the difference practically negligible.


History

In early computers such as the
IBM 701 The IBM 701 Electronic Data Processing Machine, known as the Defense Calculator while in development, was IBM’s first commercial scientific computer and its first series production mainframe computer, which was announced to the public on May ...
(29 April 1952) or the
UNIVAC I The UNIVAC I (Universal Automatic Computer I) was the first general-purpose electronic digital computer design for business application produced in the United States. It was designed principally by J. Presper Eckert and John Mauchly, the inven ...
(31 March 1951) code was position-dependent: each program was built to load into and run from a particular address. Those early computers did not have an operating system and were not multitasking-capable. Programs were loaded into main storage (or even stored on magnetic drum for execution directly from there) and run one at a time. In such an operational context, position-independent code was not necessary. The
IBM System/360 The IBM System/360 (S/360) is a family of mainframe computer systems that was announced by IBM on April 7, 1964, and delivered between 1965 and 1978. It was the first family of computers designed to cover both commercial and scientific applic ...
(7 April 1964) was designed with truncated addressing similar to that of the UNIVAC III, with code position independence in mind. In truncated addressing, memory addresses are calculated from a ''base register'' and an offset. At the beginning of a program, the programmer must establish ''addressability'' by loading a base register; normally the programmer also informs the assembler with a ''USING'' pseudo-op. The programmer can load the base register from a register known to contain the entry point address, typically R15, or can use th
BALR (Branch And Link, Register form)
instruction (with a R2 Value of 0) to store the next sequential instruction's address into the base register, which was then coded explicitly or implicitly in each instruction that referred to a storage location within the program. Multiple base registeres could be used, for code or for data. Such instructions require less memory because they do not have to hold a full 24, 31, 32, or 64 bit address (4 or 8 bytes), but instead a base register number (encoded in 4 bits) and a 12–bit address offset (encoded in 12 bits), requiring only two bytes. This programming technique is standard on IBM S/360 type systems. It has been in use through to today's IBM System/z. When coding in assembly language, the programmer has to establish addressability for the program as described above and also use other base registers for dynamically allocated storage. Compilers automatically take care of this kind of addressing. IBM's early operating system
DOS/360 Disk Operating System/360, also DOS/360, or simply DOS, is the discontinued first member of a sequence of operating systems for IBM System/360, System/370 and later mainframes. It was announced by IBM on the last day of 1964, and it was first ...
(1966) was not using virtual storage (since the early models of System S/360 did not support it), but it did have the ability to place programs to an arbitrary (or automatically chosen) storage location during loading via the PHASE name,* JCL (Job Control Language) statement. So, on S/360 systems without virtual storage, a program could be loaded at any storage location, but this required a contiguous memory area large enough to hold that program. Sometimes
memory fragmentation In computer storage, fragmentation is a phenomenon in which storage space, main storage or secondary storage, is used inefficiently, reducing capacity or performance and often both. The exact consequences of fragmentation depend on the specif ...
would occur from loading and unloading differently sized modules. Virtual storage - by design - does not have that limitation. While DOS/360 and OS/360 did not support PIC, transient SVC routines in OS/360 could not contain relocatable address constants and could run in any of the transient areas without relocation. Virtual storage was first introduced on IBM System/360 model 67 in (1965) to support IBM's first multi-tasking operating and time-sharing operating system TSS/360. Later versions of DOS/360 (DOS/VS etc.) and later IBM operating systems all utilized virtual storage. Truncated addressing remained as part of the base architecture, and still advantageous when multiple modules must be loaded into the same virtual address space. By way of comparison, on early segmented systems such as
Burroughs MCP The MCP (Master Control Program) is the operating system of the Burroughs small, medium and large systems, including the Unisys Clearpath/MCP systems. MCP was originally written in 1961 in ESPOL (Executive Systems Problem Oriented Language). ...
on the Burroughs B5000 (1961) and
Multics Multics ("Multiplexed Information and Computing Service") is an influential early time-sharing operating system based on the concept of a single-level memory.Dennis M. Ritchie, "The Evolution of the Unix Time-sharing System", Communications of ...
(1964), paging systems such as IBM
TSS/360 The IBM Time Sharing System TSS/360 is a discontinued early time-sharing operating system designed exclusively for a special model of the System/360 line of mainframes, the Model 67. Made available on a trial basis to a limited set of cust ...
(1967) or base and bounds systems such as GECOS on the GE 625 and EXEC on the UNIVAC 1107, code was also inherently position-independent, since addresses in a program were relative to the current segment rather than absolute. The invention of dynamic address translation (the function provided by an MMU) originally reduced the need for position-independent code because every process could have its own independent
address space In computing, an address space defines a range of discrete addresses, each of which may correspond to a network host, peripheral device, disk sector, a memory cell or other logical or physical entity. For software programs to save and retrieve s ...
(range of addresses). However, multiple simultaneous jobs using the same code created a waste of physical memory. If two jobs run entirely identical programs, dynamic address translation provides a solution by allowing the system simply to map two different jobs' address 32K to the same bytes of real memory, containing the single copy of the program. Different programs may share common code. For example, the payroll program and the accounts receivable program may both contain an identical sort subroutine. A shared module (a shared library is a form of shared module) gets loaded once and mapped into the two address spaces.


Technical details

Procedure calls inside a shared library are typically made through small procedure linkage table
stub Stub or Stubb may refer to: Shortened objects and entities * Stub (stock), the portion of a corporation left over after most but not all of it has been bought out or spun out * Stub, a tree cut and allowed to regrow from the trunk; see Pollardi ...
s, which then call the definitive function. This notably allows a shared library to inherit certain function calls from previously loaded libraries rather than using its own versions. Data references from position-independent code are usually made indirectly, through Global Offset Tables (GOTs), which store the addresses of all accessed
global variable In computer programming, a global variable is a variable with global scope, meaning that it is visible (hence accessible) throughout the program, unless shadowed. The set of all global variables is known as the ''global environment'' or ''global s ...
s. There is one GOT per compilation unit or object module, and it is located at a fixed offset from the code (although this offset is not known until the library is linked). When a linker links modules to create a shared library, it merges the GOTs and sets the final offsets in code. It is not necessary to adjust the offsets when loading the shared library later. Position independent functions accessing global data start by determining the absolute address of the GOT given their own current program counter value. This often takes the form of a fake function call in order to obtain the return value on stack ( x86), in a specific standard register (
SPARC SPARC (Scalable Processor Architecture) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system develope ...
, MIPS), or a special register ( POWER/
PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple– IBM– ...
/
Power ISA Power ISA is a reduced instruction set computer (RISC) instruction set architecture (ISA) currently developed by the OpenPOWER Foundation, led by IBM. It was originally developed by IBM and the now-defunct Power.org industry group. Power ISA ...
), which can then be moved to a predefined standard register, or to obtain it into that standard register ( PA-RISC,
Alpha Alpha (uppercase , lowercase ; grc, ἄλφα, ''álpha'', or ell, άλφα, álfa) is the first letter of the Greek alphabet. In the system of Greek numerals, it has a value of one. Alpha is derived from the Phoenician letter aleph , whi ...
,
ESA/390 The IBM System/390 is a discontinued mainframe product family implementing the ESA/390, the fifth generation of the System/360 instruction set architecture. The first computers to use the ESA/390 were the Enterprise System/9000 (ES/9000 ...
and
z/Architecture z/Architecture, initially and briefly called ESA Modal Extensions (ESAME), is IBM's 64-bit complex instruction set computer (CISC) instruction set architecture, implemented by its mainframe computers. IBM introduced its first z/Architect ...
). Some processor architectures, such as the
Motorola 68000 The Motorola 68000 (sometimes shortened to Motorola 68k or m68k and usually pronounced "sixty-eight-thousand") is a 16/32-bit complex instruction set computer (CISC) microprocessor, introduced in 1979 by Motorola Semiconductor Products Secto ...
,
Motorola 6809 The Motorola 6809 ("''sixty-eight-oh-nine''") is an 8-bit computing, 8-bit microprocessor with some 16-bit computing, 16-bit features. It was designed by Motorola's Terry Ritter and Joel Boney and introduced in 1978. Although source compatible wi ...
, WDC 65C816, Knuth's
MMIX MMIX (pronounced ''em-mix'') is a 64-bit reduced instruction set computing (RISC) architecture designed by Donald Knuth, with significant contributions by John L. Hennessy (who contributed to the design of the MIPS architecture) and Richard L ...
, ARM and
x86-64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging ...
allow referencing data by offset from the
program counter The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 and Itanium microprocessors, and sometimes called the instruction address register (IAR), the instruction counter, or just part of the instruction sequencer, i ...
. This is specifically targeted at making position-independent code smaller, less register demanding and hence more efficient.


Windows DLLs

Dynamic-link libraries Dynamic-link library (DLL) is Microsoft's implementation of the shared library concept in the Microsoft Windows and OS/2 operating systems. These libraries usually have the file extension DLL, OCX (for libraries containing ActiveX controls), or ...
(DLLs) in
Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ...
use variant E8 of the CALL instruction (Call near, relative, displacement relative to next instruction). These instructions need not be fixed up when a DLL is loaded. Some global variables (e.g. arrays of string literals, virtual function tables) are expected to contain an address of an object in data section respectively in code section of the dynamic library; therefore, the stored address in the global variable must be updated to reflect the address where the DLL was loaded to. The dynamic loader calculates the address referred to by a global variable and stores the value in such global variable; this triggers copy-on-write of a memory page containing such global variable. Pages with code and pages with global variables that do not contain pointers to code or global data remain shared between processes. This operation must be done in any OS that can load a dynamic library at arbitrary address. In Windows Vista and later versions of Windows, the relocation of DLLs and executables is done by the kernel memory manager, which shares the relocated binaries across multiple processes. Images are always relocated from their preferred base addresses, achieving address space layout randomization (ASLR). Versions of Windows prior to Vista require that system DLLs be prelinked at non-conflicting fixed addresses at the link time in order to avoid runtime relocation of images. Runtime relocation in these older versions of Windows is performed by the DLL loader within the context of each process, and the resulting relocated portions of each image can no longer be shared between processes. The handling of DLLs in Windows differs from the earlier
OS/2 OS/2 (Operating System/2) is a series of computer operating systems, initially created by Microsoft and IBM under the leadership of IBM software designer Ed Iacobucci. As a result of a feud between the two companies over how to position OS/2 r ...
procedure it derives from. OS/2 presents a third alternative and attempts to load DLLs that are not position-independent into a dedicated "shared arena" in memory, and maps them once they are loaded. All users of the DLL are able to use the same in-memory copy.


Multics

In
Multics Multics ("Multiplexed Information and Computing Service") is an influential early time-sharing operating system based on the concept of a single-level memory.Dennis M. Ritchie, "The Evolution of the Unix Time-sharing System", Communications of ...
each procedure conceptually has a code segment and a linkage segment. The code segment contains only code and the linkage section serves as a template for a new linkage segment. Pointer register 4 (PR4) points to the linkage segment of the procedure. A call to a procedure saves PR4 in the stack before loading it with a pointer to the callee's linkage segment. The procedure call uses an indirect pointer pair with a flag to cause a trap on the first call so that the dynamic linkage mechanism can add the new procedure and its linkage segment to the Known Segment Table (KST), construct a new linkage segment, put their segment numbers in the caller's linkage section and reset the flag in the indirect pointer pair.


TSS

In IBM S/360 Time Sharing System (TSS/360 and TSS/370) each procedure may have a read-only public CSECT and a writable private Prototype Section (PSECT). A caller loads a V-constant for the routine into General Register 15 (GR15) and copies an R-constant for the routine's PSECT into the 19th word of the save area pointed to be GR13. The Dynamic Loader does not load program pages or resolve address constants until the first page fault.


Position-independent executables

''Position-independent executables'' (PIE) are executable binaries made entirely from position-independent code. While some systems only run PIC executables, there are other reasons they are used. PIE binaries are used in some security-focused
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, whi ...
distributions to allow PaX or Exec Shield to use address space layout randomization to prevent attackers from knowing where existing executable code is during a security attack using exploits that rely on knowing the offset of the executable code in the binary, such as
return-to-libc attack A "return-to-libc" attack is a computer security attack usually starting with a buffer overflow in which a subroutine return address on a call stack is replaced by an address of a subroutine that is already present in the process executable memory ...
s. Apple's
macOS macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and la ...
and iOS fully support PIE executables as of versions 10.7 and 4.3, respectively; a warning is issued when non-PIE iOS executables are submitted for approval to Apple's App Store but there's no hard requirement yet and non-PIE applications are not rejected.
OpenBSD OpenBSD is a security-focused, free and open-source, Unix-like operating system based on the Berkeley Software Distribution (BSD). Theo de Raadt created OpenBSD in 1995 by forking NetBSD 1.0. According to the website, the OpenBSD project e ...
has PIE enabled by default on most architectures since OpenBSD 5.3, released on 1 May 2013. Support for PIE in
statically linked A stand-alone program, also known as a freestanding program, is a computer program that does not load any external module, library function or program and that is designed to boot with the bootstrap procedure of the target processor – it runs o ...
binaries, such as the executables in /bin and /sbin directories, was added near the end of 2014. openSUSE added PIE as a default in 2015-02. Beginning with
Fedora A fedora () is a hat with a soft brim and indented crown.Kilgour, Ruth Edwards (1958). ''A Pageant of Hats Ancient and Modern''. R. M. McBride Company. It is typically creased lengthwise down the crown and "pinched" near the front on both side ...
 23, Fedora maintainers decided to build packages with PIE enabled as the default.
Ubuntu Ubuntu ( ) is a Linux distribution based on Debian and composed mostly of free and open-source software. Ubuntu is officially released in three editions: '' Desktop'', ''Server'', and ''Core'' for Internet of things devices and robots. All ...
17.10 has PIE enabled by default across all architectures. Gentoo's new profiles now support PIE by default. Around July 2017,
Debian Debian (), also known as Debian GNU/Linux, is a Linux distribution composed of free and open-source software, developed by the community-supported Debian Project, which was established by Ian Murdock on August 16, 1993. The first version of De ...
enabled PIE by default. Android enabled support for PIEs in
Jelly Bean Jelly beans are small bean shaped sugar candies with soft candy shells and thick gel interiors (see gelatin and jelly). The confection is primarily made of sugar and sold in a wide variety of colors and flavors. History It has been clai ...
and removed non-PIE linker support in Lollipop.


See also

* Dynamic linker *
Object file An object file is a computer file containing object code, that is, machine code output of an assembler or compiler. The object code is usually relocatable, and not usually directly executable. There are various formats for object files, and the ...
* Code segment


Notes


References


External links


Introduction to Position Independent Code

Position Independent Code internals



The Curious Case of Position Independent Executables
{{application binary interface Operating system technology Computer libraries Computer file formats