The Info List - Arithmetic Logic Unit

--- Advertisement ---

An arithmetic logic unit (ALU) is a combinational digital electronic circuit that performs arithmetic and bitwise operations on integer binary numbers. This is in contrast to a floating-point unit (FPU), which operates on floating point numbers. An ALU is a fundamental building block of many types of computing circuits, including the central processing unit (CPU) of computers, FPUs, and graphics processing units (GPUs). A single CPU, FPU or GPU may contain multiple ALUs. The inputs to an ALU are the data to be operated on, called operands, and a code indicating the operation to be performed; the ALU's output is the result of the performed operation. In many designs, the ALU also has status inputs or outputs, or both, which convey information about a previous operation or the current operation, respectively, between the ALU and external status registers.


1 Signals

1.1 Data 1.2 Opcode 1.3 Status

1.3.1 Outputs 1.3.2 Inputs

2 Circuit operation 3 Functions

3.1 Arithmetic
operations 3.2 Bitwise logical operations 3.3 Bit shift operations

4 Applications

4.1 Multiple-precision arithmetic 4.2 Complex operations

5 Implementation 6 History 7 See also 8 References 9 External links

Signals[edit] An ALU has a variety of input and output nets, which are the electrical conductors used to convey digital signals between the ALU and external circuitry. When an ALU is operating, external circuits apply signals to the ALU inputs and, in response, the ALU produces and conveys signals to external circuitry via its outputs. Data[edit] A basic ALU has three parallel data buses consisting of two input operands (A and B) and a result output (Y). Each data bus is a group of signals that conveys one binary integer number. Typically, the A, B and Y bus widths (the number of signals comprising each bus) are identical and match the native word size of the external circuitry (e.g., the encapsulating CPU or other processor). Opcode[edit] The opcode input is a parallel bus that conveys to the ALU an operation selection code, which is an enumerated value that specifies the desired arithmetic or logic operation to be performed by the ALU. The opcode size (its bus width) determines the maximum number of different operations the ALU can perform; for example, a four-bit opcode can specify up to sixteen different ALU operations. Generally, an ALU opcode is not the same as a machine language opcode, though in some cases it may be directly encoded as a bit field within a machine language opcode. Status[edit] Outputs[edit] The status outputs are various individual signals that convey supplemental information about the result of the current ALU operation. General-purpose ALUs commonly have status signals such as:

Carry-out, which conveys the carry resulting from an addition operation, the borrow resulting from a subtraction operation, or the overflow bit resulting from a binary shift operation. Zero, which indicates all bits of Y are logic zero. Negative, which indicates the result of an arithmetic operation is negative. Overflow, which indicates the result of an arithmetic operation has exceeded the numeric range of Y. Parity, which indicates whether an even or odd number of bits in Y are logic one.

At the end of each ALU operation, the status output signals are usually stored in external registers to make them available for future ALU operations (e.g., to implement multiple-precision arithmetic) or for controlling conditional branching. The collection of bit registers that store the status outputs are often treated as a single, multi-bit register, which is referred to as the "status register" or "condition code register". Inputs[edit] The status inputs allow additional information to be made available to the ALU when performing an operation. Typically, this is a single "carry-in" bit that is the stored carry-out from a previous ALU operation. Circuit operation[edit]

The combinational logic circuitry of the 74181
integrated circuit, which is a simple four-bit ALU

An ALU is a combinational logic circuit, meaning that its outputs will change asynchronously in response to input changes. In normal operation, stable signals are applied to all of the ALU inputs and, when enough time (known as the "propagation delay") has passed for the signals to propagate through the ALU circuitry, the result of the ALU operation appears at the ALU outputs. The external circuitry connected to the ALU is responsible for ensuring the stability of ALU input signals throughout the operation, and for allowing sufficient time for the signals to propagate through the ALU before sampling the ALU result. In general, external circuitry controls an ALU by applying signals to its inputs. Typically, the external circuitry employs sequential logic to control the ALU operation, which is paced by a clock signal of a sufficiently low frequency to ensure enough time for the ALU outputs to settle under worst-case conditions. For example, a CPU begins an ALU addition operation by routing operands from their sources (which are usually registers) to the ALU's operand inputs, while the control unit simultaneously applies a value to the ALU's opcode input, configuring it to perform addition. At the same time, the CPU also routes the ALU result output to a destination register that will receive the sum. The ALU's input signals, which are held stable until the next clock, are allowed to propagate through the ALU and to the destination register while the CPU waits for the next clock. When the next clock arrives, the destination register stores the ALU result and, since the ALU operation has completed, the ALU inputs may be set up for the next ALU operation. Functions[edit] A number of basic arithmetic and bitwise logic functions are commonly supported by ALUs. Basic, general purpose ALUs typically include these operations in their repertoires: Arithmetic

Add: A and B are summed and the sum appears at Y and carry-out. Add with carry: A, B and carry-in are summed and the sum appears at Y and carry-out. Subtract: B is subtracted from A (or vice versa) and the difference appears at Y and carry-out. For this function, carry-out is effectively a "borrow" indicator. This operation may also be used to compare the magnitudes of A and B; in such cases the Y output may be ignored by the processor, which is only interested in the status bits (particularly zero and negative) that result from the operation. Subtract with borrow: B is subtracted from A (or vice versa) with borrow (carry-in) and the difference appears at Y and carry-out (borrow out). Two's complement (negate): A (or B) is subtracted from zero and the difference appears at Y. Increment: A (or B) is increased by one and the resulting value appears at Y. Decrement: A (or B) is decreased by one and the resulting value appears at Y. Pass through: all bits of A (or B) appear unmodified at Y. This operation is typically used to determine the parity of the operand or whether it is zero or negative, or to load the operand into a processor register.

Bitwise logical operations[edit]

AND: the bitwise AND of A and B appears at Y. OR: the bitwise OR of A and B appears at Y. Exclusive-OR: the bitwise XOR of A and B appears at Y. Ones' complement: all bits of A (or B) are inverted and appear at Y.

Bit shift operations[edit]

Bit shift examples for an eight-bit ALU

Type Left shift Right shift




Rotate through carry

ALU shift operations cause operand A (or B) to shift left or right (depending on the opcode) and the shifted operand appears at Y. Simple ALUs typically can shift the operand by only one bit position, whereas more complex ALUs employ barrel shifters that allow them to shift the operand by an arbitrary number of bits in one operation. In all single-bit shift operations, the bit shifted out of the operand appears on carry-out; the value of the bit shifted into the operand depends on the type of shift.

shift: the operand is treated as a two's complement integer, meaning that the most significant bit is a "sign" bit and is preserved. Logical shift: a logic zero is shifted into the operand. This is used to shift unsigned integers. Rotate: the operand is treated as a circular buffer of bits so its least and most significant bits are effectively adjacent. Rotate through carry: the carry bit and operand are collectively treated as a circular buffer of bits.

Applications[edit] Multiple-precision arithmetic[edit] In integer arithmetic computations, multiple-precision arithmetic is an algorithm that operates on integers which are larger than the ALU word size. To do this, the algorithm treats each operand as an ordered collection of ALU-size fragments, arranged from most-significant (MS) to least-significant (LS) or vice-versa. For example, in the case of an 8-bit ALU, the 2 4-bit
integer 0x123456 would be treated as a collection of three 8-bit fragments: 0x12 (MS), 0x34, and 0x56 (LS). Since the size of a fragment exactly matches the ALU word size, the ALU can directly operate on this "piece" of operand. The algorithm uses the ALU to directly operate on particular operand fragments and thus generate a corresponding fragment (a "partial") of the multi-precision result. Each partial, when generated, is written to an associated region of storage that has been designated for the multiple-precision result. This process is repeated for all operand fragments so as to generate a complete collection of partials, which is the result of the multiple-precision operation. In arithmetic operations (e.g., addition, subtraction), the algorithm starts by invoking an ALU operation on the operands' LS fragments, thereby producing both a LS partial and a carry out bit. The algorithm writes the partial to designated storage, whereas the processor's state machine typically stores the carry out bit to an ALU status register. The algorithm then advances to the next fragment of each operand's collection and invokes an ALU operation on these fragments along with the stored carry bit from the previous ALU operation, thus producing another (more significant) partial and a carry out bit. As before, the carry bit is stored to the status register and the partial is written to designated storage. This process repeats until all operand fragments have been processed, resulting in a complete collection of partials in storage, which comprise the multi-precision arithmetic result. In multiple-precision shift operations, the order of operand fragment processing depends on the shift direction. In left-shift operations, fragments are processed LS first because the LS bit of each partial -- which is conveyed via the stored carry bit -- must be obtained from the MS bit of the previously left-shifted, less-significant operand. Conversely, operands are processed MS first in right-shift operations because the MS bit of each partial must be obtained from the LS bit of the previously right-shifted, more-significant operand. In bitwise logical operations (e.g., logical AND, logical OR), the operand fragments may be processed in any arbitrary order because each partial depends only on the corresponding operand fragments (the stored carry bit from the previous ALU operation is ignored). Complex operations[edit] Although an ALU can be designed to perform complex functions, the resulting higher circuit complexity, cost, power consumption and larger size makes this impractical in many cases. Consequently, ALUs are often limited to simple functions that can be executed at very high speeds (i.e., very short propagation delays), and the external processor circuitry is responsible for performing complex functions by orchestrating a sequence of simpler ALU operations. For example, computing the square root of a number might be implemented in various ways, depending on ALU complexity:

Calculation in a single clock: a very complex ALU that calculates a square root in one operation. Calculation pipeline: a group of simple ALUs that calculates a square root in stages, with intermediate results passing through ALUs arranged like a factory production line. This circuit can accept new operands before finishing the previous ones and produces results as fast as the very complex ALU, though the results are delayed by the sum of the propagation delays of the ALU stages. Iterative calculation: a simple ALU that calculates the square root through several steps under the direction of a control unit.

The implementations above transition from fastest and most expensive to slowest and least costly. The square root is calculated in all cases, but processors with simple ALUs will take longer to perform the calculation because multiple ALU operations must be performed.

Implementation[edit] An ALU is usually implemented either as a stand-alone integrated circuit (IC), such as the 74181, or as part of a more complex IC. In the latter case, an ALU is typically instantiated by synthesizing it from a description written in VHDL, Verilog or some other hardware description language. For example, the following VHDL
code describes a very simple 8-bit ALU:

entity alu is port ( -- the alu connections to external circuitry: A : in signed(7 downto 0); -- operand A B : in signed(7 downto 0); -- operand B OP : in unsigned(2 downto 0); -- opcode Y : out signed(7 downto 0)); -- operation result end alu;

architecture behavioral of alu is begin process(A, B, OP) begin case OP is -- decode the opcode and perform the operation: when "000" => Y <= A + B; -- add when "001" => Y <= A - B; -- subtract when "010" => Y <= A - 1; -- decrement when "011" => Y <= A + 1; -- increment when "100" => Y <= not A; -- 1's complement when "101" => Y <= A and B; -- bitwise AND when "110" => Y <= A or B; -- bitwise OR when "111" => Y <= A xor B; -- bitwise XOR when others => NULL; end case; end process; end behavioral;

History[edit] Mathematician John von Neumann
John von Neumann
proposed the ALU concept in 1945 in a report on the foundations for a new computer called the EDVAC.[1] The cost, size, and power consumption of electronic circuitry was relatively high throughout the infancy of the information age. Consequently, all serial computers and many early computers, such as the PDP-8, had a simple ALU that operated on one data bit at a time, although they often presented a wider word size to programmers. One of the earliest computers to have multiple discrete single-bit ALU circuits was the 1948 Whirlwind I, which employed sixteen of such "math units" to enable it to operate on 16-bit words. In 1967, Fairchild introduced the first ALU implemented as an integrated circuit, the Fairchild 3800, consisting of an eight-bit ALU with accumulator.[2] Other integrated-circuit ALUs soon emerged, including four-bit ALUs such as the Am2901 and 74181. These devices were typically "bit slice" capable, meaning they had "carry look ahead" signals that facilitated the use of multiple interconnected ALU chips to create an ALU with a wider word size. These devices quickly became popular and were widely used in bit-slice minicomputers. Microprocessors began to appear in the early 1970s. Even though transistors had become smaller, there was often insufficient die space for a full-word-width ALU and, as a result, some early microprocessors employed a narrow ALU that required multiple cycles per machine language instruction. Examples of this includes the popular Zilog Z80, which performed eight-bit additions with a four-bit ALU.[3] Over time, transistor geometries shrank further, following Moore's law, and it became feasible to build wider ALUs on microprocessors. Modern integrated circuit (IC) transistors are orders of magnitude smaller than those of the early microprocessors, making it possible to fit highly complex ALUs on ICs. Today, many modern ALUs have wide word widths, and architectural enhancements such as barrel shifters and binary multipliers that allow them to perform, in a single clock cycle, operations that would have required multiple operations on earlier ALUs. See also[edit]

Information technology portal

Adder (electronics) Address generation unit
Address generation unit
(AGU) Binary multiplier Execution unit


^ Philip Levis (November 8, 2004). "Jonathan von Neumann and EDVAC" (PDF). cs.berkeley.edu. pp. 1, 3. Retrieved January 20, 2015.  ^ Lee Boysel (2007-10-12). "Making Your First Million (and other tips for aspiring entrepreneurs)". U. Mich. EECS Presentation / ECE Recordings. Archived from the original on 2012-11-15.  ^ Ken Shirriff. "The Z-80 has a 4-bit
ALU. Here's how it works." 2013.

Hwang, Enoch (2006). Digital Logic and Microprocessor
Design with VHDL. Thomson. ISBN 0-534-46593-5.  Stallings, William (2006). Computer
Organization & Architecture: Designing for Performance (7th ed.). Pearson Prentice Hall. ISBN 0-13-185644-8. 

External links[edit]

Wikimedia Commons has media related to Arithmetic
logic units.

ALU and its Micro-operations: Bitwise, Arithmetic
and Shift A Simulator of Complex ALU in MATLAB

v t e

CPU technologies


Turing machine Post–Turing machine Universal Turing machine Quantum Turing machine Belt machine Stack machine Register machine Counter machine Pointer machine Random access machine Random access stored program machine Finite-state machine Queue automaton Von Neumann Harvard (modified) Dataflow TTA Cellular Artificial neural network

Machine learning Deep learning Neural processing unit (NPU)

Convolutional neural network Load/store architecture Register memory architecture Endianness FIFO Zero-copy NUMA HUMA HSA Mobile computing Surface computing Wearable computing Heterogeneous computing Parallel computing Concurrent computing Distributed computing Cloud computing Amorphous computing Ubiquitous computing Fabric computing Cognitive computing Unconventional computing Hypercomputation Quantum computing Adiabatic quantum computing Linear optical quantum computing Reversible computing Reverse computation Reconfigurable computing Optical computing Ternary computer Analogous computing Mechanical computing Hybrid computing Digital computing DNA computing Peptide computing Chemical computing Organic computing Wetware computing Neuromorphic computing Symmetric multiprocessing
Symmetric multiprocessing
(SMP) Asymmetric multiprocessing
Asymmetric multiprocessing
(AMP) Cache hierarchy Memory hierarchy

ISA types



x86 z/Architecture ARM MIPS Power Architecture
Power Architecture
(PowerPC) SPARC Mill Itanium
(IA-64) Alpha Prism SuperH V850 Clipper VAX Unicore PA-RISC MicroBlaze RISC-V

Word size

1-bit 2-bit 4-bit 8-bit 9-bit 10-bit 12-bit 15-bit 16-bit 18-bit 22-bit 24-bit 25-bit 26-bit 27-bit 31-bit 32-bit 33-bit 34-bit 36-bit 39-bit 40-bit 48-bit 50-bit 60-bit 64-bit 128-bit 256-bit 512-bit Variable


Instruction pipelining

Bubble Operand forwarding

Out-of-order execution

Register renaming

Speculative execution

Branch predictor Memory dependence prediction


Parallel level


Bit-serial Word

Instruction Pipelining

Scalar Superscalar


Thread Process





Temporal Simultaneous (SMT) (Hyper-threading) Speculative (SpMT) Preemptive Cooperative Clustered Multi-Thread (CMT) Hardware scout

Flynn's taxonomy



Addressing mode

CPU performance

Instructions per second (IPS) Instructions per clock (IPC) Cycles per instruction (CPI) Floating-point operations per second (FLOPS) Transactions per second (TPS) Synaptic Updates Per Second (SUPS) Performance per watt Orders of magnitude (computing) Cache performance measurement and metric

Core count

Single-core processor Multi-core processor Manycore processor


Central processing unit
Central processing unit
(CPU) GPGPU AI accelerator Vision processing unit (VPU) Vector processor Barrel processor Stream processor Digital signal processor
Digital signal processor
(DSP) I/O processor/DMA controller Network processor Baseband processor Physics processing unit
Physics processing unit
(PPU) Coprocessor Secure cryptoprocessor ASIC FPGA FPOA CPLD Microcontroller Microprocessor Mobile processor Notebook processor Ultra-low-voltage processor Multi-core processor Manycore processor Tile processor Multi-chip module
Multi-chip module
(MCM) Chip stack multi-chip modules System on a chip
System on a chip
(SoC) Multiprocessor system-on-chip (MPSoC) Programmable System-on-Chip
(PSoC) Network on a chip (NoC)


Execution unit (EU) Arithmetic
logic unit (ALU) Address generation unit
Address generation unit
(AGU) Floating-point unit
Floating-point unit
(FPU) Load-store unit (LSU) Branch predictor Unified Reservation Station Barrel shifter Uncore Sum addressed decoder (SAD) Front-side bus Back-side bus Northbridge (computing) Southbridge (computing) Adder (electronics) Binary multiplier Binary decoder Address decoder Multiplexer Demultiplexer Registers Cache Memory management unit
Memory management unit
(MMU) Input–output memory management unit
Input–output memory management unit
(IOMMU) Integrated Memory Controller (IMC) Power Management Unit (PMU) Translation lookaside buffer
Translation lookaside buffer
(TLB) Stack engine Register file Processor register Hardware register Memory buffer register (MBR) Program counter Microcode
ROM Datapath Control unit Instruction unit Re-order buffer Data buffer Write buffer Coprocessor Electronic switch Electronic circuit Integrated circuit Three-dimensional integrated circuit Boolean circuit Digital circuit Analog circuit Mixed-signal integrated circuit Power management integrated circuit Quantum circuit Logic gate

Combinational logic Sequential logic Emitter-coupled logic
Emitter-coupled logic
(ECL) Transistor–transistor logic
Transistor–transistor logic
(TTL) Glue logic

Quantum gate Gate array Counter (digital) Bus (computing) Semiconductor device Clock rate CPU multiplier Vision chip Memristor

Power management

APM ACPI Dynamic frequency scaling Dynamic voltage scaling Clock gating

Hardware security

Non-executable memory (NX bit) Memory Protection Extensions (Intel MPX) Intel Secure Key Hardware restriction (firmware) Software Guard Extensions (Intel SGX) Trusted Execution Technology Trusted Platform Module
Trusted Platform Module
(TPM) Secure cryptoprocessor Hardware security module Hengzhi chip


History of general-purpose CPUs

v t e

Basic computer components

Input devices

Keyboard Image scanner Microphone Pointing device

Graphics tablet Joystick Light pen Mouse


Pointing stick Touchpad Touchscreen Trackball



Refreshable braille display

Output devices

Monitor Refreshable braille display Printer Speakers Plotter

Removable data storage

Optical disc

CD DVD Blu-ray

Disk pack Floppy disk Memory card USB
flash drive


Central processing unit
Central processing unit
(CPU) HDD / SSD / SSHD Motherboard Network interface controller Power supply Random-access memory
Random-access memory
(RAM) Sound card Video card Fax modem Expansion card


Ethernet FireWire (IEEE 1394) Parallel port Serial port PS/2 port USB Thunderbolt HDMI
/ DVI / VG