HOME

TheInfoList



OR:

In
computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as anal ...
, machine code is any
low-level programming language A low-level programming language is a programming language that provides little or no Abstraction (computer science), abstraction from a computer's instruction set architecture—commands or functions in the language map that are structurally sim ...
, consisting of machine language instructions, which are used to control a computer's
central processing unit A central processing unit (CPU), also called a central processor, main processor or just Processor (computing), processor, is the electronic circuitry that executes Instruction (computing), instructions comprising a computer program. The CPU per ...
(CPU). Each instruction causes the CPU to perform a very specific task, such as a load, a store, a jump, or an arithmetic logic unit (ALU) operation on one or more units of data in the CPU's registers or
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered ...
. Early CPUs had specific machine code that might break backwards compatibility with each new CPU released. The notion of an instruction set architecture (ISA) defines and specifies the behavior and encoding in memory of the instruction set of the system, without specifying its exact implementation. This acts as an abstraction layer, enabling compatibility within the same family of CPUs, so that machine code written or generated according to the ISA for the family will run on all CPUs in the family, including future CPUs. In general, each architecture family (e.g.
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intr ...
,
ARM In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between th ...
) has its own ISA, and hence its own specific machine code language. There are exceptions, e.g. the
IA-64 IA-64 (Intel Itanium architecture) is the instruction set architecture (ISA) of the Itanium family of 64-bit Intel microprocessors. The basic ISA specification originated at Hewlett-Packard (HP), and was subsequently implemented by Intel in col ...
can emulate x86. Machine code is a strictly numerical language, and is the lowest-level interface to the CPU intended for a programmer. There is, on some CPUs, a lower level interface in the form of (modifiable) microcode that implement the machine code. However, microcode is not intended to be changed by the end user on normal commercial CPUs. Assembly language provides a direct mapping between the numerical machine code and a human readable version where numerical opcodes and operands are replaced by readable strings (e.g. 0x90 is the NOP instruction on
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intr ...
). While it is possible to write programs directly in machine code, managing individual bits and calculating numerical addresses and constants manually is tedious and error-prone. For this reason, programs are very rarely written directly in machine code in modern contexts, but may be done for low level debugging, program patching (especially when assembler source is not available) and assembly language
disassembly A disassembler is a computer program that translates machine language into assembly language—the inverse operation to that of an assembler. A disassembler differs from a decompiler, which targets a high-level language rather than an assembly l ...
. The majority of practical programs today are written in higher-level languages or assembly language. The source code is then translated to executable machine code by utilities such as
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
s,
assemblers Assembler may refer to: Arts and media * Nobukazu Takemura, avant-garde electronic musician, stage name Assembler * Assemblers, a fictional race in the ''Star Wars'' universe * Assemblers, an alternative name of the superhero group Champions of A ...
, and linkers, with the important exception of interpreted programs, which are not translated into machine code. However, the '' interpreter'' itself, which may be seen as an executor or processor performing the instructions of the source code, typically consists of directly executable machine code (generated from assembly or high-level language source code). Machine code is by definition the lowest level of programming detail visible to the programmer, but internally many processors use microcode or optimise and transform machine code instructions into sequences of
micro-ops In computer central processing units, micro-operations (also known as micro-ops or μops, historically also as micro-actions) are detailed low-level instructions used in some designs to implement complex machine instructions (sometimes termed m ...
. This is not generally considered to be a machine code.


Instruction set

Every processor or processor family has its own instruction set. Instructions are patterns of
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
s, digits, or characters that correspond to machine commands. Thus, the instruction set is specific to a class of processors using (mostly) the same
architecture Architecture is the art and technique of designing and building, as distinguished from the skills associated with construction. It is both the process and the product of sketching, conceiving, planning, designing, and constructing building ...
. Successor or derivative processor designs often include instructions of a predecessor and may add new additional instructions. Occasionally, a successor design will discontinue or alter the meaning of some instruction code (typically because it is needed for new purposes), affecting code compatibility to some extent; even compatible processors may show slightly different behavior for some instructions, but this is rarely a problem. Systems may also differ in other details, such as memory arrangement, operating systems, or
peripheral devices A peripheral or peripheral device is an auxiliary device used to put information into and get information out of a computer. The term ''peripheral device'' refers to all hardware components that are attached to a computer and are controlled by the ...
. Because a program normally relies on such factors, different systems will typically not run the same machine code, even when the same type of processor is used. A processor's instruction set may have fixed-length or variable-length instructions. How the patterns are organized varies with the particular architecture and type of instruction. Most instructions have one or more opcode fields that specify the basic instruction type (such as arithmetic, logical, jump, etc.), the operation (such as add or compare), and other fields that may give the type of the
operand In mathematics, an operand is the object of a mathematical operation, i.e., it is the object or quantity that is operated on. Example The following arithmetic expression shows an example of operators and operands: :3 + 6 = 9 In the above exam ...
(s), the
addressing mode Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions i ...
(s), the addressing offset(s) or index, or the operand value itself (such constant operands contained in an instruction are called ''immediate''). Not all machines or individual instructions have explicit operands. On a machine with a single accumulator, the accumulator is implicitly both the left operand and result of most arithmetic instructions. Some other architectures, such as the
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intr ...
architecture, have accumulator versions of common instructions, with the accumulator regarded as one of the general registers by longer instructions. A stack machine has most or all of its operands on an implicit stack. Special purpose instructions also often lack explicit operands; for example, CPUID in the x86 architecture writes values into four implicit destination registers. This distinction between explicit and implicit operands is important in code generators, especially in the
register allocation In compiler optimization, register allocation is the process of assigning local automatic variables and expression results to a limited number of processor registers. Register allocation can happen over a basic block (''local register allocatio ...
and live range tracking parts. A good code optimizer can track implicit as well as explicit operands which may allow more frequent constant propagation, constant folding of registers (a register assigned the result of a constant expression freed up by replacing it by that constant) and other code enhancements.


Programs

A
computer program A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components. A computer program ...
is a list of instructions that can be executed by a
central processing unit A central processing unit (CPU), also called a central processor, main processor or just Processor (computing), processor, is the electronic circuitry that executes Instruction (computing), instructions comprising a computer program. The CPU per ...
(CPU). A program's execution is done in order for the CPU that is executing it to solve a problem and thus accomplish a result. While simple processors are able to execute instructions one after another, superscalar processors are able under certain circumstances (when the pipeline is full) of executing two or more instructions simultaneously. As an example, the original
Intel Pentium Pentium is a brand used for a series of x86 architecture-compatible microprocessors produced by Intel. The original Pentium processor from which the brand took its name was first released on March 22, 1993. After that, the Pentium II and P ...
from 1993 can execute at most two instructions per clock cycle when its pipeline is full.
Program flow In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an ''imp ...
may be influenced by special 'jump' instructions that transfer execution to an address (and hence instruction) other than the next numerically sequential address. Whether these
conditional jump A branch is an instruction in a computer program that can cause a computer to begin executing a different instruction sequence and thus deviate from its default behavior of executing instructions in order. ''Branch'' (or ''branching'', ''branc ...
s occur is dependent upon a condition such as a value being greater than, less than, or equal to another value.


Assembly languages

A much more human friendly rendition of machine language, called assembly language, uses mnemonic codes to refer to machine code instructions, rather than using the instructions' numeric values directly, and uses symbolic names to refer to storage locations and sometimes registers. For example, on the
Zilog Z80 The Z80 is an 8-bit microprocessor introduced by Zilog as the startup company's first product. The Z80 was conceived by Federico Faggin in late 1974 and developed by him and his 11 employees starting in early 1975. The first working samples were ...
processor, the machine code 00000101, which causes the CPU to decrement the B processor register, would be represented in assembly language as DEC B.


Example

The
MIPS architecture MIPS (Microprocessor without Interlocked Pipelined Stages) is a family of reduced instruction set computer (RISC) instruction set architectures (ISA)Price, Charles (September 1995). ''MIPS IV Instruction Set'' (Revision 3.2), MIPS Technologies, ...
provides a specific example for a machine code whose instructions are always 32 bits long. The general type of instruction is given by the ''op'' (operation) field, the highest 6 bits. J-type (jump) and I-type (immediate) instructions are fully specified by ''op''. R-type (register) instructions include an additional field ''funct'' to determine the exact operation. The fields used in these types are: 6 5 5 5 5 6 bits rs , rt , rd , shamt, funct R-type rs , rt , address/immediate I-type target address J-type ''rs'', ''rt'', and ''rd'' indicate register operands; ''shamt'' gives a shift amount; and the ''address'' or ''immediate'' fields contain an operand directly. For example, adding the registers 1 and 2 and placing the result in register 6 is encoded: rs , rt , rd , shamt, funct 0 1 2 6 0 32 decimal 000000 00001 00010 00110 00000 100000 binary Load a value into register 8, taken from the memory cell 68 cells after the location listed in register 3: rs , rt , address/immediate 35 3 8 68 decimal 100011 00011 01000 00000 00001 000100 binary Jumping to the address 1024: target address 2 1024 decimal 000010 00000 00000 00000 10000 000000 binary


Overlapping instructions

On processor architectures with
variable-length instruction set In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
s (such as
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
's
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intr ...
processor family) it is, within the limits of the control-flow resynchronizing phenomenon known as the
Kruskal Count Martin David Kruskal (; September 28, 1925 – December 26, 2006) was an American mathematician and physicist. He made fundamental contributions in many areas of mathematics and science, ranging from plasma physics to general relativity and ...
, sometimes possible through opcode-level programming to deliberately arrange the resulting code so that two code paths share a common fragment of opcode sequences. These are called ''overlapping instructions'', ''overlapping opcodes'', ''overlapping code'', ''overlapped code'', ''instruction scission'', or ''jump into the middle of an instruction'', and represent a form of superposition. In the 1970s and 1980s, overlapping instructions were sometimes used to preserve memory space. One example were in the implementation of error tables in
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washin ...
's
Altair BASIC Altair BASIC is a discontinued interpreter for the BASIC programming language that ran on the MITS Altair 8800 and subsequent S-100 bus computers. It was Microsoft's first product (as Micro-Soft), distributed by MITS under a contract. Altair BASI ...
, where ''interleaved instructions'' mutually shared their instruction bytes. The technique is rarely used today, but might still be necessary to resort to in areas where extreme optimization for size is necessary on byte-level such as in the implementation of
boot loader A bootloader, also spelled as boot loader or called boot manager and bootstrap loader, is a computer program that is responsible for booting a computer. When a computer is turned off, its softwareincluding operating systems, application code, a ...
s which have to fit into boot sectors. It is also sometimes used as a code obfuscation technique as a measure against
disassembly A disassembler is a computer program that translates machine language into assembly language—the inverse operation to that of an assembler. A disassembler differs from a decompiler, which targets a high-level language rather than an assembly l ...
and tampering. The principle is also utilized in shared code sequences of fat binaries which must run on multiple instruction-set-incompatible processor platforms. This property is also used to find unintended instructions called gadgets in existing code repositories and is utilized in return-oriented programming as alternative to
code injection Code injection is the exploitation of a computer bug that is caused by processing invalid data. The injection is used by an attacker to introduce (or "inject") code into a vulnerable computer program and change the course of execution. The re ...
for exploits such as return-to-libc attacks.


Relationship to microcode

In some computers, the machine code of the
architecture Architecture is the art and technique of designing and building, as distinguished from the skills associated with construction. It is both the process and the product of sketching, conceiving, planning, designing, and constructing building ...
is implemented by an even more fundamental underlying layer called microcode, providing a common machine language interface across a line or family of different models of computer with widely different underlying dataflows. This is done to facilitate
porting In software engineering, porting is the process of adapting software for the purpose of achieving some form of execution in a computing environment that is different from the one that a given program (meant for such execution) was originally desi ...
of machine language programs between different models. An example of this use is the IBM System/360 family of computers and their successors. With dataflow path widths of 8 bits to 64 bits and beyond, they nevertheless present a common architecture at the machine language level across the entire line. Using microcode to implement an
emulator In computing, an emulator is hardware or software that enables one computer system (called the ''host'') to behave like another computer system (called the ''guest''). An emulator typically enables the host system to run software or use pe ...
enables the computer to present the architecture of an entirely different computer. The System/360 line used this to allow porting programs from earlier IBM machines to the new family of computers, e.g. an IBM 1401/1440/1460 emulator on the IBM S/360 model 40.


Relationship to bytecode

Machine code is generally different from
bytecode Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
(also known as p-code), which is either executed by an interpreter or itself compiled into machine code for faster (direct) execution. An exception is when a processor is designed to use a particular bytecode directly as its machine code, such as is the case with
Java processor A Java processor is the implementation of the Java virtual machine (JVM) in hardware. In other words, the Java bytecode that makes up the instruction set of the abstract machine becomes the instruction set of a concrete machine. These were the most ...
s. Machine code and assembly code are sometimes called '' native code'' when referring to platform-dependent parts of language features or libraries.


Storing in memory

From the point of view of the CPU, machine code is stored in RAM, but is typically also kept in a set of caches for performance reasons. There may be different caches for instructions and data, depending on the architecture. The CPU knows what machine code to execute, based on its internal program counter. The program counter points to a memory address and is changed based on special instructions which may cause programmatic branches. The program counter is typically set to a hard coded value when the CPU is first powered on, and will hence execute whatever machine code happens to be at this address. Similarly, the program counter can be set to execute whatever machine code is at some arbitrary address, even if this isn't valid machine code. This will typically trigger an architecture specific protection fault. The CPU is oftentimes told, by page permissions in a paging based system, if the current page actually holds machine code by an execute bit — pages have multiple such permission bits (readable, writable, etc.) for various housekeeping functionality. E.g. on
Unix-like A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-li ...
systems memory pages can be toggled to be executable with the system call, and on Windows, can be used to achieve a similar result. If an attempt is made to execute machine code on a non-executable page, an architecture specific fault will typically occur. Treating data as machine code, or finding new ways to use existing machine code, by various techniques, is the basis of some security vulnerabilities. From the point of view of a
process A process is a series or set of activities that interact to produce a result; it may occur once-only or be recurrent or periodic. Things called a process include: Business and management *Business process, activities that produce a specific se ...
, the ''code space'' is the part of its address space where the code in execution is stored. In multitasking systems this comprises the program's
code segment In computing, a code segment, also known as a text segment or simply as text, is a portion of an object file or the corresponding section of the program's virtual address space that contains executable instructions. Segment The term "segment" ...
and usually shared libraries. In multi-threading environment, different threads of one process share code space along with data space, which reduces the overhead of
context switching In computing, a context switch is the process of storing the state of a process or thread, so that it can be restored and resume execution at a later point, and then restoring a different, previously saved, state. This allows multiple processes ...
considerably as compared to process switching.


Readability by humans

Pamela Samuelson wrote that machine code is so unreadable that the United States Copyright Office cannot identify whether a particular encoded program is an original work of authorship; however, the US Copyright Office ''does'' allow for copyright registration of computer programs and a program's machine code can sometimes be decompiled in order to make its functioning more easily understandable to humans. However, the output of a decompiler or disassembler will be missing the comments and symbolic references, so while the output may be easier to read than the object code, it will still be more difficult than the original source code. This problem does not exist for object-code formats like
SQUOZE SQUOZE (abbreviated as SQZ) is a memory-efficient representation of a combined source and relocatable object program file with a symbol table on punched cards which was introduced in 1958 with the SCAT assembler on the SHARE Operating Syste ...
, where the source code is included in the file. Cognitive science professor Douglas Hofstadter has compared machine code to
genetic code The genetic code is the set of rules used by living cells to translate information encoded within genetic material ( DNA or RNA sequences of nucleotide triplets, or codons) into proteins. Translation is accomplished by the ribosome, which links ...
, saying that "Looking at a program written in machine language is vaguely comparable to looking at a DNA molecule atom by atom."


See also

* Assembly language *
Endianness In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the mos ...
* List of machine languages *
Machine code monitor A machine code monitor ( machine language monitor) is software that allows a user to enter commands to view and change memory locations on a computer, with options to load and save memory contents from/to secondary storage. Some full-featured m ...
* Overhead code *
P-code machine In computer programming, a p-code machine (portable code machine) is a virtual machine designed to execute ''p-code'' (the assembly language or machine code of a hypothetical central processing unit (CPU)). This term is applied both generically t ...
*
Reduced instruction set computing In computer engineering, a reduced instruction set computer (RISC) is a computer designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a complex instruction set compu ...
(RISC) *
Very long instruction word Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units (CPU, processor) mostly allow programs to specify instructions to exe ...
* Teaching Machine Code: Micro-Professor MPF-I


Notes


References


Further reading

* * * {{Authority control * Low-level programming languages