Barrel processor
   HOME

TheInfoList



OR:

A barrel processor is a CPU that switches between threads of execution on every cycle. This
CPU design Processor design is a subfield of computer science and computer engineering (fabrication) that deals with creating a processor (computing), processor, a key component of computer hardware. The design process involves choosing an instruction set an ...
technique is also known as "interleaved" or "fine-grained" temporal multithreading. Unlike
simultaneous multithreading Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better use the resources provided by modern proces ...
in modern
superscalar A superscalar processor (or multiple-issue processor) is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single in ...
architectures, it generally does not allow execution of multiple instructions in one cycle. Like preemptive multitasking, each thread of execution is assigned its own
program counter The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 and Itanium microprocessors, and sometimes called the instruction address register (IAR), the instruction counter, or just part of the instruction sequencer, ...
and other
hardware register In digital electronics, especially computing, hardware registers are circuits typically composed of flip-flops, often with many characteristics similar to memory, such as: * Using an memory or port address to select a particular register in a ma ...
s (each thread's architectural state). A barrel processor can guarantee that each thread will execute one instruction every ''n'' cycles, unlike a preemptive multitasking machine, that typically runs one thread of execution for tens of millions of cycles, while all other threads wait their turn. A technique called C-slowing can automatically generate a corresponding barrel processor design from a single-tasking processor design. An ''n''-way barrel processor generated this way acts much like ''n'' separate
multiprocessing Multiprocessing (MP) is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them. The ...
copies of the original single-tasking processor, each one running at roughly 1/''n'' the original speed.


History

One of the earliest examples of a barrel processor was the I/O processing system in the
CDC 6000 series The CDC 6000 series is a discontinued family of mainframe computers manufactured by Control Data Corporation in the 1960s. It consisted of the CDC 6200, CDC 6300, #Versions, CDC 6400, #Versions, CDC 6500, CDC 6600 and #Versions, CDC 6700 computers, ...
supercomputers. These executed one instruction (or a portion of an instruction) from each of 10 different virtual processors (called peripheral processors or PPs) before returning to the first processor.CDC Cyber 170 Computer Systems; Models 720, 730, 750, and 760; Model 176 (Level B); CPU Instruction Set; PPU Instruction Set
-- See page 2-44 for an illustration of the rotating "barrel".
From
CDC 6000 series The CDC 6000 series is a discontinued family of mainframe computers manufactured by Control Data Corporation in the 1960s. It consisted of the CDC 6200, CDC 6300, #Versions, CDC 6400, #Versions, CDC 6500, CDC 6600 and #Versions, CDC 6700 computers, ...
we read that "The peripheral processors are collectively implemented as a barrel processor. Each executes routines independently of the others. They are a loose predecessor of bus mastering or
direct memory access Direct memory access (DMA) is a feature of computer systems that allows certain hardware subsystems to access main system computer memory, memory independently of the central processing unit (CPU). Without DMA, when the CPU is using programmed i ...
." One motivation for barrel processors was to reduce hardware costs. In the case of the CDC 6x00 PPUs, the digital logic of the processor was much faster than the core memory, so rather than having ten separate processors, there are ten separate core memory units for the PPUs, but they all share the single set of processor logic. Another example is the Honeywell 800, which had 8 groups of registers, allowing up to 8 concurrent programs. After each instruction, the processor would (in most cases) switch to the next active program in sequence. Barrel processors have also been used as large-scale central processors. The Tera MTA (1988) was a large-scale barrel processor design with 128 threads per core. The MTA architecture has seen continued development in successive products, such as the
Cray Urika-GD The Cray Urika-GD is a graph discovery appliance is a computer application that finds and analyzes relationships and patterns in the data collected by a supercomputer. The Cray Urika-GD generates graphs based on large amounts of data, often from ...
, originally introduced in 2012 (as the YarcData uRiKA) and targeted at data-mining applications. Barrel processors are also found in embedded systems, where they are particularly useful for their deterministic real-time thread performance. An early example is the “Dual CPU” version of the four-bit COP400 that was introduced by
National Semiconductor National Semiconductor Corporation was an United States of America, American Semiconductor manufacturing, semiconductor manufacturer, which specialized in analogue electronics, analog devices and subsystems, formerly headquartered in Santa Clara, ...
in 1981. This single-chip
microcontroller A microcontroller (MC, uC, or μC) or microcontroller unit (MCU) is a small computer on a single integrated circuit. A microcontroller contains one or more CPUs (processor cores) along with memory and programmable input/output peripherals. Pro ...
contains two ostensibly independent CPUs that share instructions, memory, and most IO devices. In reality, the dual CPUs are a single two-thread barrel processor. It works by duplicating certain sections of the processor—those that store the architectural state—but not duplicating the main execution resources such as ALU, buses, and memory. Separate architectural states are established with duplicated A (accumulators), B (pointer registers), C (carry flags), N (stack pointers), and PC (program counters). Another example is the XMOS XCore XS1 (2007), a four-stage barrel processor with eight threads per core. (Newer processors from XMOS also have the same type of architecture.) The XS1 is found in Ethernet, USB, audio, and control devices, and other applications where I/O performance is critical. When the XS1 is programmed in the 'XC' language, software controlled
direct memory access Direct memory access (DMA) is a feature of computer systems that allows certain hardware subsystems to access main system computer memory, memory independently of the central processing unit (CPU). Without DMA, when the CPU is using programmed i ...
may be implemented. Barrel processors have also been used in specialized devices such as the eight-thread Ubicom IP3023 network I/O processor (2004). Some 8-bit microcontrollers by Padauk Technology feature barrel processors with up to 8 threads per core.


Comparison with single-threaded processors


Advantages

A single-tasking processor spends a lot of time idle, not doing anything useful whenever a
cache miss In computing, a cache ( ) is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsew ...
or
pipeline stall In the design of instruction pipeline, pipelined computer processors, a pipeline stall is a delay in execution of an instruction set, instruction in order to resolve a hazard (computer architecture), hazard. Details In a standard classic RISC pip ...
occurs. Advantages to employing barrel processors over single-tasking processors include: * The ability to do useful work on the other threads while the stalled thread is waiting. * Designing an ''n''-way barrel processor with an ''n''-deep
pipeline A pipeline is a system of Pipe (fluid conveyance), pipes for long-distance transportation of a liquid or gas, typically to a market area for consumption. The latest data from 2014 gives a total of slightly less than of pipeline in 120 countries ...
is much simpler than designing a single-tasking processor because a barrel processor never has a
pipeline stall In the design of instruction pipeline, pipelined computer processors, a pipeline stall is a delay in execution of an instruction set, instruction in order to resolve a hazard (computer architecture), hazard. Details In a standard classic RISC pip ...
and doesn't need feed-forward circuits. * For real-time applications, a barrel processor can guarantee that a "real-time" thread can execute with precise timing, no matter what happens to the other threads, even if some other thread locks up in an
infinite loop In computer programming, an infinite loop (or endless loop) is a sequence of instructions that, as written, will continue endlessly, unless an external intervention occurs, such as turning off power via a switch or pulling a plug. It may be inte ...
or is continuously interrupted by
hardware interrupt In digital computers, an interrupt (sometimes referred to as a trap) is a request for the processor to ''interrupt'' currently executing code (when permitted), so that the event can be processed in a timely manner. If the request is accepted ...
s.


Disadvantages

There are a few disadvantages to barrel processors. * The state of each thread must be kept on-chip, typically in registers, to avoid costly off-chip context switches. This requires a large number of registers compared to typical processors. * Either all threads must share the same cache, which slows overall system performance, or there must be one unit of cache for each execution thread, which can significantly increase the
transistor count The transistor count is the number of transistors in an electronic device (typically on a single substrate or silicon die). It is the most common measure of integrated circuit complexity (although the majority of transistors in modern microproc ...
and thus the cost of such a CPU. However, in hard real-time
embedded system An embedded system is a specialized computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is e ...
s where barrel processors are often found, memory access costs are typically calculated assuming worst-case cache behavior, so this is a minor concern. Some barrel processors such as the XMOS XS1 do not have a cache at all.


See also

* Super-threading *
Computer multitasking In computing, multitasking is the concurrent computing, concurrent execution of multiple tasks (also known as Process (computing), processes) over a certain period of time. New tasks can interrupt already started ones before they finish, instea ...
*
Simultaneous multithreading Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better use the resources provided by modern proces ...
(SMT) * Hyper-threading *
Vector processor In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called ...
* Cray XMT


References


External links


Soft peripherals
Embedded.com article examines Ubicom's IP3023 processor
An Evaluation of the Design of the Gamma 60


(French and English) {{DEFAULTSORT:Barrel Processor Central processing unit Instruction processing Threads (computing)