
Hardware acceleration is the use of
computer hardware designed to perform specific functions more efficiently when compared to
software
Software is a set of computer programs and associated software documentation, documentation and data (computing), data. This is in contrast to Computer hardware, hardware, from which the system is built and which actually performs the work.
...
running on a general-purpose
central processing unit
A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
(CPU). Any
transformation of
data
In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
that can be calculated in software running on a generic CPU can also be calculated in custom-made hardware, or in some mix of both.
To perform computing tasks more quickly (or better in some other way), generally one can invest time and money in improving the software, improving the hardware, or both. There are various approaches with advantages and disadvantages in terms of decreased
latency, increased
throughput and reduced
energy consumption. Typical advantages of focusing on software may include more rapid
development, lower
non-recurring engineering costs, heightened
portability
Portability may refer to:
*Portability (social security), the portability of social security benefits
* Porting, the ability of a computer program to be ported from one system to another in computer science
** Software portability, the portability ...
, and ease of
updating features or
patching
bugs
Bugs may refer to:
* Plural of bug
Arts, entertainment and media Fictional characters
* Bugs Bunny, a character
* Bugs Meany, a character in the ''Encyclopedia Brown'' books
Films
* ''Bugs'' (2003 film), a science-fiction-horror film
* ''Bugs ...
, at the cost of
overhead to compute general operations. Advantages of focusing on hardware may include
speedup, reduced
power consumption, lower latency, increased
parallelism and
bandwidth, and
better utilization of area and
functional components available on an
integrated circuit; at the cost of lower ability to update designs once
etched onto silicon and higher costs of
functional verification
In electronic design automation, functional verification is the task of verifying that the logic design conforms to specification. Functional verification attempts to answer the question "Does this proposed design do what is intended?" This is a ...
, and times to market. In the hierarchy of digital computing systems ranging from general-purpose processors to
fully customized hardware, there is a tradeoff between flexibility and efficiency, with efficiency increasing by
orders of magnitude when any given application is implemented higher up that hierarchy. This hierarchy includes general-purpose processors such as CPUs, more specialized processors such as
GPUs,
fixed-function
Fixed-function is a term canonically used to contrast 3D graphics APIs and earlier GPUs designed prior to the advent of shader-based 3D graphics APIs and GPU architectures.
History
Historically fixed-function APIs consisted of a set of functi ...
implemented on
field-programmable gate array
A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence the term '' field-programmable''. The FPGA configuration is generally specified using a hardware ...
s (FPGAs), and fixed-function implemented on
application-specific integrated circuit
An application-specific integrated circuit (ASIC ) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use, such as a chip designed to run in a digital voice recorder or a high-effici ...
s (ASICs).
Hardware acceleration is advantageous for
performance
A performance is an act of staging or presenting a play, concert, or other form of entertainment. It is also defined as the action or process of carrying out or accomplishing an action, task, or function.
Management science
In the work place ...
, and practical when the functions are fixed so updates are not as needed as in software solutions. With the advent of
reprogrammable logic devices such as FPGAs, the restriction of hardware acceleration to fully fixed algorithms has eased since 2010, allowing hardware acceleration to be applied to problem domains requiring modification to algorithms and processing
control flow
In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an '' ...
.
The disadvantage however, is that in many open source projects, it requires proprietary libraries that not all vendors are keen to distribute or expose, making it difficult to integrate in such projects.
Overview
Integrated circuits can be created to perform arbitrary operations on
analog
Analog or analogue may refer to:
Computing and electronics
* Analog signal, in which information is encoded in a continuous variable
** Analog device, an apparatus that operates on analog signals
*** Analog electronics, circuits which use analo ...
and
digital signals. Most often in computing, signals are digital and can be interpreted as
binary number
A binary number is a number expressed in the base-2 numeral system or binary numeral system, a method of mathematical expression which uses only two symbols: typically "0" ( zero) and "1" (one).
The base-2 numeral system is a positional notati ...
data. Computer hardware and software operate on information in binary representation to perform
computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
; this is accomplished by calculating
boolean functions on the
bit
The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented a ...
s of input and outputting the result to some
output device downstream for
storage
Storage may refer to:
Goods Containers
* Dry cask storage, for storing high-level radioactive waste
* Food storage
* Intermodal container, cargo shipping
* Storage tank
Facilities
* Garage (residential), a storage space normally used to store car ...
or
further processing.
Computational equivalence of hardware and software
Because all
Turing machines can run any
computable function, it is always possible to design custom hardware that performs the same function as a given piece of software. Conversely, software can be always used to emulate the function of a given piece of hardware. Custom hardware may offer higher performance per watt for the same functions that can be specified in software.
Hardware description languages (HDLs) such as
Verilog
Verilog, standardized as IEEE 1364, is a hardware description language (HDL) used to model electronic systems. It is most commonly used in the design and verification of digital circuits at the register-transfer level of abstraction. It is a ...
and
VHDL
The VHSIC Hardware Description Language (VHDL) is a hardware description language (HDL) that can model the behavior and structure of digital systems at multiple levels of abstraction, ranging from the system level down to that of logic gat ...
can model the same
semantics
Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and compu ...
as software and
synthesize the design into a
netlist
In electronic design, a netlist is a description of the connectivity of an electronic circuit. In its simplest form, a netlist consists of a list of the electronic components in a circuit and a list of the nodes they are connected to. A netwo ...
that can be programmed to an FPGA or composed into the
logic gate
A logic gate is an idealized or physical device implementing a Boolean function, a logical operation performed on one or more binary inputs that produces a single binary output. Depending on the context, the term may refer to an ideal logic ga ...
s of an ASIC.
Stored-program computers
The vast majority of software-based computing occurs on machines implementing the
von Neumann architecture, collectively known as
stored-program computers.
Computer program
A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components.
A computer progra ...
s are stored as data and
executed
Capital punishment, also known as the death penalty, is the State (polity), state-sanctioned practice of deliberately killing a person as a punishment for an actual or supposed crime, usually following an authorized, rule-governed process to ...
by
processors. Such processors must fetch and decode instructions, as well as
loading data operands from
memory
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered ...
(as part of the
instruction cycle
The instruction cycle (also known as the fetch–decode–execute cycle, or simply the fetch-execute cycle) is the cycle that the central processing unit (CPU) follows from boot-up until the computer has shut down in order to process instructions ...
) to execute the instructions constituting the software program. Relying on a common
cache for code and data leads to the "von Neumann bottleneck", a fundamental limitation on the throughput of software on processors implementing the von Neumann architecture. Even in the
modified Harvard architecture, where instructions and data have separate caches in the
memory hierarchy, there is overhead to decoding instruction
opcode
In computing, an opcode (abbreviated from operation code, also known as instruction machine code, instruction code, instruction syllable, instruction parcel or opstring) is the portion of a machine language instruction that specifies the opera ...
s and
multiplexing
In telecommunications and computer networking, multiplexing (sometimes contracted to muxing) is a method by which multiple analog or digital signals are combined into one signal over a shared medium. The aim is to share a scarce resource - a ...
available
execution unit
In computer engineering, an execution unit (E-unit or EU) is a part of the central processing unit (CPU) that performs the operations and calculations as instructed by the computer program. It may have its own internal control sequence unit (not ...
s on a
microprocessor
A microprocessor is a computer processor where the data processing logic and control is included on a single integrated circuit, or a small number of integrated circuits. The microprocessor contains the arithmetic, logic, and control circu ...
or
microcontroller, leading to low circuit utilization. Modern processors that provide
simultaneous multithreading exploit under-utilization of available processor functional units and
instruction level parallelism between different hardware threads.
Hardware execution units
Hardware execution units do not in general rely on the von Neumann or modified Harvard architectures and do not need to perform the instruction fetch and decode steps of an
instruction cycle
The instruction cycle (also known as the fetch–decode–execute cycle, or simply the fetch-execute cycle) is the cycle that the central processing unit (CPU) follows from boot-up until the computer has shut down in order to process instructions ...
and incur those stages' overhead. If needed calculations are specified in a
register transfer level (RTL) hardware design, the time and circuit area costs that would be incurred by instruction fetch and decoding stages can be reclaimed and put to other uses.
This reclamation saves time, power and circuit area in computation. The reclaimed resources can be used for increased parallel computation, other functions, communication or memory, as well as increased
input/output
In computing, input/output (I/O, or informally io or IO) is the communication between an information processing system, such as a computer, and the outside world, possibly a human or another information processing system. Inputs are the signals ...
capabilities. This comes at the cost of general-purpose utility.
Emerging hardware architectures
Greater RTL customization of hardware designs allows emerging architectures such as
in-memory computing,
transport triggered architecture
In computer architecture, a transport triggered architecture (TTA) is a kind of processor design in which programs directly control the internal transport buses of a processor. Computation happens as a side effect of data transports: writing data ...
s (TTA) and
networks-on-chip (NoC) to further benefit from increased
locality of data to execution context, thereby reducing computing and communication latency between modules and functional units.
Custom hardware is limited in parallel processing capability only by the area and
logic blocks available on the
integrated circuit die. Therefore, hardware is much more free to offer
massive parallelism than software on general-purpose processors, offering a possibility of implementing the
parallel random-access machine (PRAM) model.
It is common to build
multicore and
manycore processing units out of
microprocessor IP core schematics on a single FPGA or ASIC. Similarly, specialized functional units can be composed in parallel as
in digital signal processing without being embedded in a processor
IP core. Therefore, hardware acceleration is often employed for repetitive, fixed tasks involving little
conditional branching, especially on large amounts of data. This is how
Nvidia
Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
's
CUDA line of GPUs are implemented.
Implementation metrics
As device mobility has increased, new metrics have been developed that measure the relative performance of specific acceleration protocols, considering the characteristics such as physical hardware dimensions, power consumption and operations throughput. These can be summarized into three categories: task efficiency, implementation efficiency, and flexibility. Appropriate metrics consider the area of the hardware along with both the corresponding operations throughput and energy consumed.
Applications
Examples of hardware acceleration include
bit blit
Bit blit (also written BITBLT, BIT BLT, BitBLT, Bit BLT, Bit Blt etc., which stands for ''bit block transfer'') is a data operation commonly used in computer graphics in which several bitmaps are combined into one using a '' boolean function''.
T ...
acceleration functionality in graphics processing units (GPUs), use of
memristors for accelerating
neural networks
A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
and
regular expression
A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
hardware acceleration for
spam control in the
server industry, intended to prevent
regular expression denial of service (ReDoS) attacks.
The hardware that performs the acceleration may be part of a general-purpose CPU, or a separate unit called a hardware accelerator, though they are usually referred with a more specific term, such as 3D accelerator, or
cryptographic accelerator
In computing, a cryptographic accelerator is a co-processor designed specifically to perform computationally intensive cryptographic operations, doing so far more efficiently than the general-purpose CPU. Because many servers' system load consists ...
.
Traditionally, processors were sequential (instructions are executed one by one), and were designed to run general purpose algorithms controlled by
instruction fetch (for example moving temporary results
to and from a
register file). Hardware accelerators improve the execution of a specific algorithm by allowing greater
concurrency
Concurrent means happening at the same time. Concurrency, concurrent, or concurrence may refer to:
Law
* Concurrence, in jurisprudence, the need to prove both ''actus reus'' and ''mens rea''
* Concurring opinion (also called a "concurrence"), a ...
, having specific
datapaths for their
temporary variable
In computer programming, a temporary variable is a variable with short lifetime, usually to hold data that will soon be discarded, or before it can be placed at a more permanent memory location. Because it is short-lived, it is usually declared as ...
s, and reducing the overhead of instruction control in the fetch-decode-execute cycle.
Modern processors are
multi-core and often feature parallel "single-instruction; multiple data" (
SIMD) units. Even so, hardware acceleration still yields benefits. Hardware acceleration is suitable for any computation-intensive algorithm which is executed frequently in a task or program. Depending upon the granularity, hardware acceleration can vary from a small functional unit, to a large functional block (like
motion estimation
Motion estimation is the process of determining ''motion vectors'' that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions b ...
in
MPEG-2
MPEG-2 (a.k.a. H.222/H.262 as was defined by the ITU) is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods, w ...
).
Hardware acceleration units by application
See also
*
Coprocessor
*
DirectX Video Acceleration (DXVA)
*
Direct memory access (DMA)
*
High-level synthesis
**
C to HDL
**
Flow to HDL
*
Soft microprocessor
*
Flynn's taxonomy
Flynn's taxonomy is a classification of computer architectures, proposed by Michael J. Flynn in 1966 and extended in 1972. The classification system has stuck, and it has been used as a tool in design of modern processors and their functionalities ...
of parallel
computer architecture
In computer engineering, computer architecture is a description of the structure of a computer system made from component parts. It can sometimes be a high-level description that ignores details of the implementation. At a more detailed level, the ...
s
**
Single instruction, multiple data (SIMD)
**
Single instruction, multiple threads (SIMT)
**
Multiple instructions, multiple data (MIMD)
*
Computer for operations with functions
Within computer engineering and computer science, a computer for operations with (mathematical) functions (unlike the usual computer) operates with functions at the hardware level (i.e. without programming these operations).see also here http: ...
References
External links
*
{{Digital electronics
Application-specific integrated circuits
Central processing unit
Computer optimization
Gate arrays
Graphics hardware
Articles with example C code