HOME

TheInfoList



OR:

In
computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as anal ...
, machine code is any
low-level programming language A low-level programming language is a programming language that provides little or no Abstraction (computer science), abstraction from a computer's instruction set architecture—commands or functions in the language map that are structurally sim ...
, consisting of machine language instructions, which are used to control a computer's
central processing unit A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, a ...
(CPU). Each instruction causes the CPU to perform a very specific task, such as a load, a store, a
jump Jumping is a form of locomotion or movement in which an organism or non-living (e.g., robotic) mechanical system propels itself through the air along a ballistic trajectory. Jump or Jumping also may refer to: Places * Jump, Kentucky or Jump S ...
, or an
arithmetic logic unit In computing, an arithmetic logic unit (ALU) is a combinational digital circuit that performs arithmetic and bitwise operations on integer binary numbers. This is in contrast to a floating-point unit (FPU), which operates on floating point num ...
(ALU) operation on one or more units of data in the CPU's
register Register or registration may refer to: Arts entertainment, and media Music * Register (music), the relative "height" or range of a note, melody, part, instrument, etc. * ''Register'', a 2017 album by Travis Miller * Registration (organ), th ...
s or
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remember ...
. Early CPUs had specific machine code that might break backwards compatibility with each new CPU released. The notion of an
instruction set architecture In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ...
(ISA) defines and specifies the behavior and encoding in memory of the instruction set of the system, without specifying its exact implementation. This acts as an abstraction layer, enabling compatibility within the same family of CPUs, so that machine code written or generated according to the ISA for the family will run on all CPUs in the family, including future CPUs. In general, each architecture family (e.g. x86, ARM) has its own ISA, and hence its own specific machine code language. There are exceptions, e.g. the
IA-64 IA-64 (Intel Itanium architecture) is the instruction set architecture (ISA) of the Itanium family of 64-bit Intel microprocessors. The basic ISA specification originated at Hewlett-Packard (HP), and was subsequently implemented by Intel in col ...
can emulate x86. Machine code is a strictly numerical language, and is the lowest-level interface to the CPU intended for a programmer. There is, on some CPUs, a lower level interface in the form of (modifiable)
microcode In processor design, microcode (μcode) is a technique that interposes a layer of computer organization between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer. Microcode is a la ...
that implement the machine code. However, microcode is not intended to be changed by the end user on normal commercial CPUs.
Assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence b ...
 provides a direct mapping between the numerical machine code and a human readable version where numerical opcodes and operands are replaced by readable strings (e.g. 0x90 is the NOP instruction on x86). While it is possible to write programs directly in machine code, managing individual bits and calculating numerical addresses and constants manually is tedious and error-prone. For this reason, programs are very rarely written directly in machine code in modern contexts, but may be done for low level
debugging In computer programming and software development, debugging is the process of finding and resolving '' bugs'' (defects or problems that prevent correct operation) within computer programs, software, or systems. Debugging tactics can involve i ...
, program
patching Patching is a small village and civil parish that lies amid the fields and woods of the southern slopes of the South Downs in the National Park in the Arun District of West Sussex, England. It has a visible hill-workings history going back t ...
(especially when assembler source is not available) and assembly language
disassembly A disassembler is a computer program that translates machine language into assembly language—the inverse operation to that of an assembler. A disassembler differs from a decompiler, which targets a high-level language rather than an assembly l ...
. The majority of practical programs today are written in higher-level languages or assembly language. The source code is then translated to executable machine code by utilities such as
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
s,
assemblers Assembler may refer to: Arts and media * Nobukazu Takemura, avant-garde electronic musician, stage name Assembler * Assemblers, a fictional race in the ''Star Wars'' universe * Assemblers, an alternative name of the superhero group Champions of A ...
, and linkers, with the important exception of interpreted programs, which are not translated into machine code. However, the '' interpreter'' itself, which may be seen as an executor or processor performing the instructions of the source code, typically consists of directly executable machine code (generated from assembly or high-level language source code). Machine code is by definition the lowest level of programming detail visible to the programmer, but internally many processors use
microcode In processor design, microcode (μcode) is a technique that interposes a layer of computer organization between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer. Microcode is a la ...
or optimise and transform machine code instructions into sequences of micro-ops. This is not generally considered to be a machine code.


Instruction set

Every processor or processor family has its own
instruction set In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ...
. Instructions are patterns of
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
s, digits, or characters that correspond to machine commands. Thus, the instruction set is specific to a class of processors using (mostly) the same
architecture Architecture is the art and technique of designing and building, as distinguished from the skills associated with construction. It is both the process and the product of sketching, conceiving, planning, designing, and constructing buildings ...
. Successor or derivative processor designs often include instructions of a predecessor and may add new additional instructions. Occasionally, a successor design will discontinue or alter the meaning of some instruction code (typically because it is needed for new purposes), affecting code compatibility to some extent; even compatible processors may show slightly different behavior for some instructions, but this is rarely a problem. Systems may also differ in other details, such as memory arrangement, operating systems, or peripheral devices. Because a program normally relies on such factors, different systems will typically not run the same machine code, even when the same type of processor is used. A processor's instruction set may have fixed-length or variable-length instructions. How the patterns are organized varies with the particular architecture and type of instruction. Most instructions have one or more
opcode In computing, an opcode (abbreviated from operation code, also known as instruction machine code, instruction code, instruction syllable, instruction parcel or opstring) is the portion of a machine language instruction that specifies the operat ...
fields that specify the basic instruction type (such as arithmetic, logical,
jump Jumping is a form of locomotion or movement in which an organism or non-living (e.g., robotic) mechanical system propels itself through the air along a ballistic trajectory. Jump or Jumping also may refer to: Places * Jump, Kentucky or Jump S ...
, etc.), the operation (such as add or compare), and other fields that may give the type of the
operand In mathematics, an operand is the object of a mathematical operation, i.e., it is the object or quantity that is operated on. Example The following arithmetic expression shows an example of operators and operands: :3 + 6 = 9 In the above exam ...
(s), the
addressing mode Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions i ...
(s), the addressing offset(s) or index, or the operand value itself (such constant operands contained in an instruction are called ''immediate''). Not all machines or individual instructions have explicit operands. On a machine with a single accumulator, the accumulator is implicitly both the left operand and result of most arithmetic instructions. Some other architectures, such as the x86 architecture, have accumulator versions of common instructions, with the accumulator regarded as one of the general registers by longer instructions. A
stack machine In computer science, computer engineering and programming language implementations, a stack machine is a computer processor or a virtual machine in which the primary interaction is moving short-lived temporary values to and from a push down ...
has most or all of its operands on an implicit stack. Special purpose instructions also often lack explicit operands; for example, CPUID in the x86 architecture writes values into four implicit destination registers. This distinction between explicit and implicit operands is important in code generators, especially in the
register allocation In compiler optimization, register allocation is the process of assigning local automatic variables and expression results to a limited number of processor registers. Register allocation can happen over a basic block (''local register allocatio ...
and live range tracking parts. A good code optimizer can track implicit as well as explicit operands which may allow more frequent constant propagation, constant folding of registers (a register assigned the result of a constant expression freed up by replacing it by that constant) and other code enhancements.


Programs

A
computer program A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. Computer programs are one component of software, which also includes software documentation, documentation and oth ...
is a list of instructions that can be executed by a
central processing unit A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, a ...
(CPU). A program's execution is done in order for the CPU that is executing it to solve a problem and thus accomplish a result. While simple processors are able to execute instructions one after another,
superscalar A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a sup ...
processors are able under certain circumstances (when the pipeline is full) of executing two or more instructions simultaneously. As an example, the original Intel Pentium from 1993 can execute at most two instructions per clock cycle when its pipeline is full. Program flow may be influenced by special 'jump' instructions that transfer execution to an address (and hence instruction) other than the next numerically sequential address. Whether these conditional jumps occur is dependent upon a condition such as a value being greater than, less than, or equal to another value.


Assembly languages

A much more human friendly rendition of machine language, called
assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence b ...
, uses mnemonic codes to refer to machine code instructions, rather than using the instructions' numeric values directly, and uses symbolic names to refer to storage locations and sometimes registers. For example, on the
Zilog Z80 The Z80 is an 8-bit microprocessor introduced by Zilog as the startup company's first product. The Z80 was conceived by Federico Faggin in late 1974 and developed by him and his 11 employees starting in early 1975. The first working samples were ...
processor, the machine code 00000101, which causes the CPU to decrement the B
processor register A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. ...
, would be represented in assembly language as DEC B.


Example

The
MIPS architecture MIPS (Microprocessor without Interlocked Pipelined Stages) is a family of reduced instruction set computer (RISC) instruction set architectures (ISA)Price, Charles (September 1995). ''MIPS IV Instruction Set'' (Revision 3.2), MIPS Technologies, ...
provides a specific example for a machine code whose instructions are always 32 bits long. The general type of instruction is given by the ''op'' (operation) field, the highest 6 bits. J-type (jump) and I-type (immediate) instructions are fully specified by ''op''. R-type (register) instructions include an additional field ''funct'' to determine the exact operation. The fields used in these types are: 6 5 5 5 5 6 bits rs , rt , rd , shamt, funct R-type rs , rt , address/immediate I-type target address J-type ''rs'', ''rt'', and ''rd'' indicate register operands; ''shamt'' gives a shift amount; and the ''address'' or ''immediate'' fields contain an operand directly. For example, adding the registers 1 and 2 and placing the result in register 6 is encoded: rs , rt , rd , shamt, funct 0 1 2 6 0 32 decimal 000000 00001 00010 00110 00000 100000 binary Load a value into register 8, taken from the memory cell 68 cells after the location listed in register 3: rs , rt , address/immediate 35 3 8 68 decimal 100011 00011 01000 00000 00001 000100 binary Jumping to the address 1024: target address 2 1024 decimal 000010 00000 00000 00000 10000 000000 binary


Overlapping instructions

On processor architectures with variable-length instruction sets (such as
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 ser ...
's x86 processor family) it is, within the limits of the control-flow resynchronizing phenomenon known as the Kruskal Count, sometimes possible through opcode-level programming to deliberately arrange the resulting code so that two code paths share a common fragment of opcode sequences. These are called ''overlapping instructions'', ''overlapping opcodes'', ''overlapping code'', ''overlapped code'', ''instruction scission'', or ''jump into the middle of an instruction'', and represent a form of superposition. In the 1970s and 1980s, overlapping instructions were sometimes used to preserve memory space. One example were in the implementation of error tables in
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washi ...
's
Altair BASIC Altair BASIC is a discontinued interpreter for the BASIC programming language that ran on the MITS Altair 8800 and subsequent S-100 bus computers. It was Microsoft's first product (as Micro-Soft), distributed by MITS under a contract. Altair BASI ...
, where ''interleaved instructions'' mutually shared their instruction bytes. The technique is rarely used today, but might still be necessary to resort to in areas where extreme optimization for size is necessary on byte-level such as in the implementation of
boot loader A bootloader, also spelled as boot loader or called boot manager and bootstrap loader, is a computer program that is responsible for booting a computer. When a computer is turned off, its softwareincluding operating systems, application code, a ...
s which have to fit into boot sectors. It is also sometimes used as a
code obfuscation In software development, obfuscation is the act of creating source or machine code that is difficult for humans or computers to understand. Like obfuscation in natural language, it may use needlessly roundabout expressions to compose statem ...
technique as a measure against
disassembly A disassembler is a computer program that translates machine language into assembly language—the inverse operation to that of an assembler. A disassembler differs from a decompiler, which targets a high-level language rather than an assembly l ...
and tampering. The principle is also utilized in shared code sequences of fat binaries which must run on multiple instruction-set-incompatible processor platforms. This property is also used to find
unintended instruction An illegal opcode, also called an unimplemented operation, unintended opcode or undocumented instruction, is an instruction to a CPU that is not mentioned in any official documentation released by the CPU's designer or manufacturer, which nev ...
s called
gadget A gadget is a mechanical device or any ingenious article. Gadgets are sometimes referred to as ''gizmos''. History The etymology of the word is disputed. The word first appears as reference to an 18th-century tool in glassmaking that was develop ...
s in existing code repositories and is utilized in
return-oriented programming Return-oriented programming (ROP) is a computer security exploit technique that allows an attacker to execute code in the presence of security defenses such as executable space protection and code signing. In this technique, an attacker gains cont ...
as alternative to
code injection Code injection is the exploitation of a computer bug that is caused by processing invalid data. The injection is used by an attacker to introduce (or "inject") code into a vulnerable computer program and change the course of execution. The re ...
for exploits such as
return-to-libc attack A "return-to-libc" attack is a computer security attack usually starting with a buffer overflow in which a subroutine return address on a call stack is replaced by an address of a subroutine that is already present in the process executable memory ...
s.


Relationship to microcode

In some computers, the machine code of the
architecture Architecture is the art and technique of designing and building, as distinguished from the skills associated with construction. It is both the process and the product of sketching, conceiving, planning, designing, and constructing buildings ...
is implemented by an even more fundamental underlying layer called
microcode In processor design, microcode (μcode) is a technique that interposes a layer of computer organization between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer. Microcode is a la ...
, providing a common machine language interface across a line or family of different models of computer with widely different underlying
dataflow In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming. Software architecture Da ...
s. This is done to facilitate
porting In software engineering, porting is the process of adapting software for the purpose of achieving some form of execution in a computing environment that is different from the one that a given program (meant for such execution) was originally desi ...
of machine language programs between different models. An example of this use is the IBM
System/360 The IBM System/360 (S/360) is a family of mainframe computer systems that was announced by IBM on April 7, 1964, and delivered between 1965 and 1978. It was the first family of computers designed to cover both commercial and scientific applica ...
family of computers and their successors. With dataflow path widths of 8 bits to 64 bits and beyond, they nevertheless present a common architecture at the machine language level across the entire line. Using microcode to implement an
emulator In computing, an emulator is hardware or software that enables one computer system (called the ''host'') to behave like another computer system (called the ''guest''). An emulator typically enables the host system to run software or use pe ...
enables the computer to present the architecture of an entirely different computer. The System/360 line used this to allow porting programs from earlier IBM machines to the new family of computers, e.g. an IBM 1401/1440/1460 emulator on the IBM S/360 model 40.


Relationship to bytecode

Machine code is generally different from
bytecode Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
(also known as p-code), which is either executed by an interpreter or itself compiled into machine code for faster (direct) execution. An exception is when a processor is designed to use a particular bytecode directly as its machine code, such as is the case with Java processors. Machine code and assembly code are sometimes called ''
native Native may refer to: People * Jus soli, citizenship by right of birth * Indigenous peoples, peoples with a set of specific rights based on their historical ties to a particular territory ** Native Americans (disambiguation) In arts and entert ...
code'' when referring to platform-dependent parts of language features or libraries.


Storing in memory

From the point of view of the CPU, machine code is stored in RAM, but is typically also kept in a set of caches for performance reasons. There may be different caches for instructions and data, depending on the architecture. The CPU knows what machine code to execute, based on its internal program counter. The program counter points to a memory address and is changed based on special instructions which may cause programmatic branches. The program counter is typically set to a hard coded value when the CPU is first powered on, and will hence execute whatever machine code happens to be at this address. Similarly, the program counter can be set to execute whatever machine code is at some arbitrary address, even if this isn't valid machine code. This will typically trigger an architecture specific protection fault. The CPU is oftentimes told, by page permissions in a paging based system, if the current page actually holds machine code by an execute bit — pages have multiple such permission bits (readable, writable, etc.) for various housekeeping functionality. E.g. on
Unix-like A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-li ...
systems memory pages can be toggled to be executable with the system call, and on Windows, can be used to achieve a similar result. If an attempt is made to execute machine code on a non-executable page, an architecture specific fault will typically occur. Treating data as machine code, or finding new ways to use existing machine code, by various techniques, is the basis of some security vulnerabilities. From the point of view of a process, the ''code space'' is the part of its
address space In computing, an address space defines a range of discrete addresses, each of which may correspond to a network host, peripheral device, disk sector, a memory cell or other logical or physical entity. For software programs to save and retrieve s ...
where the code in execution is stored. In multitasking systems this comprises the program's
code segment In computing, a code segment, also known as a text segment or simply as text, is a portion of an object file or the corresponding section of the program's virtual address space that contains executable instructions. Segment The term "segment" ...
and usually
shared libraries In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development. These may include configuration data, documentation, help data, message templates, pre-written code and ...
. In multi-threading environment, different threads of one process share code space along with data space, which reduces the overhead of context switching considerably as compared to process switching.


Readability by humans

Pamela Samuelson wrote that machine code is so unreadable that the
United States Copyright Office The United States Copyright Office (USCO), a part of the Library of Congress, is a United States government body that maintains records of copyright registration, including a copyright catalog. It is used by copyright title searchers who ar ...
cannot identify whether a particular encoded program is an original work of authorship; however, the US Copyright Office ''does'' allow for copyright registration of computer programs and a program's machine code can sometimes be decompiled in order to make its functioning more easily understandable to humans. However, the output of a decompiler or disassembler will be missing the comments and symbolic references, so while the output may be easier to read than the object code, it will still be more difficult than the original source code. This problem does not exist for object-code formats like
SQUOZE SQUOZE (abbreviated as SQZ) is a memory-efficient representation of a combined source and relocatable object program file with a symbol table on punched cards which was introduced in 1958 with the SCAT assembler on the SHARE Operating Syste ...
, where the source code is included in the file. Cognitive science professor
Douglas Hofstadter Douglas Richard Hofstadter (born February 15, 1945) is an American scholar of cognitive science, physics, and comparative literature whose research includes concepts such as the sense of self in relation to the external world, consciousness, a ...
has compared machine code to
genetic code The genetic code is the set of rules used by living cells to translate information encoded within genetic material ( DNA or RNA sequences of nucleotide triplets, or codons) into proteins. Translation is accomplished by the ribosome, which links ...
, saying that "Looking at a program written in machine language is vaguely comparable to looking at a DNA molecule atom by atom."


See also

*
Assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence b ...
*
Endianness In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the mos ...
* List of machine languages *
Machine code monitor A machine code monitor ( machine language monitor) is software that allows a user to enter commands to view and change memory locations on a computer, with options to load and save memory contents from/to secondary storage. Some full-featured m ...
*
Overhead code In computing, object code or object module is the product of a compiler. In a general sense object code is a sequence of statements or instructions in a computer language, usually a machine code language (i.e., binary) or an intermediate langua ...
*
P-code machine In computer programming, a p-code machine (portable code machine) is a virtual machine designed to execute ''p-code'' (the assembly language or machine code of a hypothetical central processing unit (CPU)). This term is applied both generically t ...
*
Reduced instruction set computing In computer engineering, a reduced instruction set computer (RISC) is a computer designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a complex instruction set compu ...
(RISC) *
Very long instruction word Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units (CPU, processor) mostly allow programs to specify instructions to exe ...
* Teaching Machine Code:
Micro-Professor MPF-I The Micro-Professor MPF-I, introduced in 1981 by Multitech (which, in 1987, changed its name to Acer), was the first branded computer product from Multitech and probably one of the world's longest selling computers. The MPF-I, specifically design ...


Notes


References


Further reading

* * * {{Authority control * Low-level programming languages