An object file is a
file that contains
machine code
In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers, machine code is the binaryOn nonb ...
or
bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normal ...
, as well as other data and
metadata
Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive ...
, generated by a
compiler
In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
or
assembler from
source code
In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer.
Since a computer, at base, only ...
during the compilation or assembly process. The machine code that is generated is known as
object code
In computing, object code or object module is the product of an assembler or compiler
In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' ...
.
The object code is usually
relocatable, and not usually directly
executable
In computer science, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instruction (computer science), in ...
. There are various formats for object files, and the same machine code can be packaged in different object file formats. An object file may also work like a
shared library
In computing, a library is a collection of System resource, resources that can be leveraged during software development to implement a computer program. Commonly, a library consists of executable code such as compiled function (computer scienc ...
.
The metadata that object files may include can be used for linking or debugging; it includes information to resolve symbolic cross-references between different modules,
relocation information,
stack unwinding
In computer science, a call stack is a stack data structure that stores information about the active subroutines and inline blocks of a computer program. This type of stack is also known as an execution stack, program stack, control stack, run- ...
information,
comments, program
symbols
A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise different concep ...
, and debugging or
profiling information. Other metadata may include the date and time of compilation, the compiler name and version, and other identifying information.
The term "object program" dates from at least the 1950s:
A
linker
Linker or linkers may refer to:
Computing
* Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
is used to combine the object code into one executable program or library pulling in precompiled system libraries as needed.
Object file formats
There are many different object file formats; originally each type of computer had its own unique format, but with the advent of
Unix
Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
and other
portable
Portable may refer to:
General
* Portable building, a manufactured structure that is built off site and moved in upon completion of site and utility work
* Portable classroom, a temporary building installed on the grounds of a school to provide a ...
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
s, some formats, such as
ELF
An elf (: elves) is a type of humanoid supernatural being in Germanic peoples, Germanic folklore. Elves appear especially in Norse mythology, North Germanic mythology, being mentioned in the Icelandic ''Poetic Edda'' and the ''Prose Edda'' ...
and
COFF
The Common Object File Format (COFF) is a format for executable, object code, and shared library computer files used on Unix systems. It was introduced in Unix System V, replaced the previously used a.out format, and formed the basis for ext ...
, have been defined and used on different kinds of systems.
Some systems make a distinction between formats which are directly executable and formats which require processing by the
linker
Linker or linkers may refer to:
Computing
* Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
. For example,
OS/360 and successors
OS/360, officially known as IBM System/360 Operating System, is a discontinued batch processing operating system developed by IBM for their then-new System/360 mainframe computer, announced in 1964; it was influenced by the earlier IBSYS/IBJOB a ...
call the first format a ''load module'' and the second an ''object module''. In this case the files have entirely different formats.
DOS
DOS (, ) is a family of disk-based operating systems for IBM PC compatible computers. The DOS family primarily consists of IBM PC DOS and a rebranded version, Microsoft's MS-DOS, both of which were introduced in 1981. Later compatible syste ...
and
Windows
Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
also have different file formats for executable files and object files, such as
Portable Executable
The Portable Executable (PE) format is a file format for executables, object file, object code, Dynamic-link library, dynamic-link-libraries (DLLs), and binary files used on 32-bit and 64-bit Microsoft Windows, Windows operating systems, as well ...
for executables and COFF for object files in 32-bit and 64-bit Windows.
Unix and
Unix-like
A Unix-like (sometimes referred to as UN*X, *nix or *NIX) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Uni ...
systems have used the same format for
executable
In computer science, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instruction (computer science), in ...
and object files, starting with the original
a.out format. Some formats can contain machine code for different processors, with the correct one chosen by the operating system when the program is loaded.
The design and/or choice of an object file format is a key part of overall system design. It affects the performance of the linker and thus
programmer
A programmer, computer programmer or coder is an author of computer source code someone with skill in computer programming.
The professional titles Software development, ''software developer'' and Software engineering, ''software engineer' ...
turnaround while a program is being developed. If the format is used for executables, the design also affects the time programs take to
begin running, and thus the
responsiveness
Responsiveness as a concept of computer science refers to the specific ability of a system or functional unit to complete assigned tasks within a given time. For example, it would refer to the ability of an artificial intelligence system to und ...
for users.
The
GNU Project
The GNU Project ( ) is a free software, mass collaboration project announced by Richard Stallman on September 27, 1983. Its goal is to give computer users freedom and control in their use of their computers and Computer hardware, computing dev ...
's
Binary File Descriptor library
The Binary File Descriptor library (BFD) is the GNU Project's main mechanism for the portable manipulation of object files in a variety of formats. , it supports approximately 50 file formats and 25 instruction set architectures.
History
When ...
(BFD library) provides a common
API
An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
for the manipulation of object files in a variety of formats.
Absolute files
Many early computers, or small
microcomputers, support only an absolute object format. Programs are not relocatable; they need to be assembled or compiled to execute at specific, predefined addresses. The file contains no relocation or linkage information. These files can be loaded into read/write memory, or stored in
read-only memory
Read-only memory (ROM) is a type of non-volatile memory used in computers and other electronic devices. Data stored in ROM cannot be electronically modified after the manufacture of the memory device. Read-only memory is useful for storing sof ...
. For example, the
Motorola 6800
The 6800 ("''sixty-eight hundred''") is an 8-bit microprocessor designed and first manufactured by Motorola in 1974. The MC6800 microprocessor was part of the M6800 Microcomputer System (later dubbed ''68xx'') that also included serial and parall ...
MIKBUG monitor contains a routine to read an absolute object file (
SREC Format) from
paper tape
Five- and eight-hole wide punched paper tape
Paper tape reader on the Harwell computer with a small piece of five-hole tape connected in a circle – creating a physical program loop
Punched tape or perforated paper tape is a form of data st ...
.
DOS
DOS (, ) is a family of disk-based operating systems for IBM PC compatible computers. The DOS family primarily consists of IBM PC DOS and a rebranded version, Microsoft's MS-DOS, both of which were introduced in 1981. Later compatible syste ...
COM files are a more recent example of absolute object files.
Segmentation
Most object file formats are structured as separate sections of data, each section containing a certain type of data. These sections are known as "segments" due to the term "
memory segment", which was previously a common form of
memory management
Memory management (also dynamic memory management, dynamic storage allocation, or dynamic memory allocation) is a form of Resource management (computing), resource management applied to computer memory. The essential requirement of memory manag ...
. When a program is loaded into memory by a
loader, the loader allocates various regions of memory to the program. Some of these regions correspond to sections of the object file, and thus are usually known by the same names. Others, such as the stack, only exist at run time. In some cases,
relocation is done by the loader (or linker) to specify the actual memory addresses. However, for many programs or architectures, relocation is not necessary, due to being handled by the
memory management unit
A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware unit that examines all references to computer memory, memory, and translates the memory addresses being referenced, known as virtual mem ...
or by
position-independent code
In computing, position-independent code (PIC) or position-independent executable (PIE) is a body of machine code that executes properly regardless of its memory address. PIC is commonly used for shared libraries, so that the same library code c ...
. On some systems the segments of the object file can then be copied (paged) into memory and executed, without needing further processing. On these systems, this may be done ''lazily'', that is, only when the segments are referenced during execution, for example via a
memory-mapped file
A memory-mapped file is a segment of virtual memory that has been assigned a direct byte-for-byte correlation with some portion of a file or file-like resource. This resource is typically a file that is physically present on disk, but can also b ...
backed by the object file.
Types of data supported by typical object file formats:
* Header (descriptive and control information)
*
Code segment
In computing, a code segment, also known as a text segment or simply as text, is a portion of an object file or the corresponding section of the program's virtual address space that contains executable instructions.
Segment
The term "segment" c ...
("text segment", executable code)
*
Data segment
In computing, a data segment (often denoted .data) is a portion of an object file or the corresponding address space of a program that contains initialized static variables, that is, global variables and static local variables. The size of thi ...
(initialized
static variable
In computer programming, a static variable is a variable that has been allocated "statically", meaning that its lifetime (or "extent") is the entire run of the program. This is in contrast to shorter-lived automatic variables, whose storage is ...
s)
* Read-only data segment (''
rodata
In computing, a data segment (often denoted .data) is a portion of an object file or the corresponding address space of a program that contains initialized static variables, that is, global variables and static local variables. The size of thi ...
,'' initialized static
constants
Constant or The Constant may refer to:
Mathematics
* Constant (mathematics), a non-varying value
* Mathematical constant, a special number that arises naturally in mathematics, such as or
Other concepts
* Control variable or scientific const ...
)
*
BSS segment
In computer programming, the block starting symbol (abbreviated to .bss or bss) is the portion of an object file, executable, or assembly language code that contains statically allocated variables that are declared but have not been assigned a va ...
(uninitialized static data, both variables and constants)
* External definitions and references for linking
*
Relocation information
*
Dynamic linking
In computing, a dynamic linker is the part of an operating system that loads and links the shared libraries needed by an executable when it is executed (at " run time"), by copying the content of libraries from persistent storage to RAM, fill ...
information
*
Debugging
In engineering, debugging is the process of finding the Root cause analysis, root cause, workarounds, and possible fixes for bug (engineering), bugs.
For software, debugging tactics can involve interactive debugging, control flow analysis, Logf ...
information
Segments in different object files may be combined by the linker according to rules specified when the segments are defined. Conventions exist for segments shared between object files; for instance, in
DOS
DOS (, ) is a family of disk-based operating systems for IBM PC compatible computers. The DOS family primarily consists of IBM PC DOS and a rebranded version, Microsoft's MS-DOS, both of which were introduced in 1981. Later compatible syste ...
there are
different memory models that specify the names of special segments and whether or not they may be combined.
The
debugging data format
A debugging data format is a means of storing information about a compiled computer program for use by high-level debuggers. Modern debugging data formats store enough information to allow source-level debugging.
High-level debuggers need informa ...
of debugging information may either be an integral part of the object file format, as in
COFF
The Common Object File Format (COFF) is a format for executable, object code, and shared library computer files used on Unix systems. It was introduced in Unix System V, replaced the previously used a.out format, and formed the basis for ext ...
, or a semi-independent format which may be used with several object formats, such as
stabs or
DWARF
Dwarf, dwarfs or dwarves may refer to:
Common uses
*Dwarf (folklore), a supernatural being from Germanic folklore
* Dwarf, a human or animal with dwarfism
Arts, entertainment, and media Fictional entities
* Dwarf (''Dungeons & Dragons''), a sh ...
.
See also
*
OS/360 Object File Format
*
Intel hexadecimal object file format
Intel hexadecimal object file format, Intel hex format or Intellec Hex is a file format that conveys binary data, binary information in ASCII text file, text form, making it possible to store on non-binary media such as paper tape, punch car ...
(typically with file extension .HEX, but sometimes also with .OBJ)
*
Object Module Format (ICL) (OMF for ICL VME)
*
Object Module Format (Intel) (OMF for Intel 8080/8085, OBJ for Intel 8086)
*
Mach-O
References
Further reading
* Code
ftp://ftp.iecc.com/pub/linker/] Errata
*
* (NB. Description of the Microsoft REL file format for relocatable objects, also used by Digital Research.)
*
* (16 pages)
* (1+23 pages)
*
* (1 page) (NB. Describes the history and relationship of IEEE 695 with CUFOM and MUFOM.)
* (NB. Superseeds IEEE 695-1985 (1985-09-09)).
{{executables
Executable file formats
Compiler construction
Computer libraries
Programming language implementation