HOME
The Info List - Digital Signal Processor


--- Advertisement ---



A digital signal processor (DSP) is a specialized microprocessor (or a SIP block), with its architecture optimized for the operational needs of digital signal processing.[1][2] The goal of DSPs is usually to measure, filter or compress continuous real-world analog signals. Most general-purpose microprocessors can also execute digital signal processing algorithms successfully, but dedicated DSPs usually have better power efficiency thus they are more suitable in portable devices such as mobile phones because of power consumption constraints.[3] DSPs often use special memory architectures that are able to fetch multiple data or instructions at the same time.

Contents

1 Overview 2 Architecture

2.1 Software architecture

2.1.1 Instruction sets 2.1.2 Data instructions 2.1.3 Program flow

2.2 Hardware architecture

2.2.1 Memory architecture

2.2.1.1 Addressing and virtual memory

3 History 4 Modern DSPs 5 See also 6 References 7 External links

Overview[edit]

A typical digital processing system

Digital signal processing
Digital signal processing
algorithms typically require a large number of mathematical operations to be performed quickly and repeatedly on a series of data samples. Signals (perhaps from audio or video sensors) are constantly converted from analog to digital, manipulated digitally, and then converted back to analog form. Many DSP applications have constraints on latency; that is, for the system to work, the DSP operation must be completed within some fixed time, and deferred (or batch) processing is not viable. Most general-purpose microprocessors and operating systems can execute DSP algorithms successfully, but are not suitable for use in portable devices such as mobile phones and PDAs because of power efficiency constraints.[3] A specialized digital signal processor, however, will tend to provide a lower-cost solution, with better performance, lower latency, and no requirements for specialised cooling or large batteries.[citation needed] Such performance improvements have led to the introduction of digital signal processing in commercial communications satellites where hundreds or even thousands of analog filters, switches, frequency converters and so on are required to receive and process the uplinked signals and ready them for downlinking, and can be replaced with specialised DSPs with a significant benefits to the satellites' weight, power consumption, complexity/cost of construction, reliability and flexibility of operation. For example, the SES-12 and SES-14 satellites from operator SES, both intended for launch in 2017, are being built by Airbus Defence and Space
Airbus Defence and Space
with 25% of capacity using DSP.[4] The architecture of a digital signal processor is optimized specifically for digital signal processing. Most also support some of the features as an applications processor or microcontroller, since signal processing is rarely the only task of a system. Some useful features for optimizing DSP algorithms are outlined below. Architecture[edit] Software architecture[edit] By the standards of general-purpose processors, DSP instruction sets are often highly irregular; while traditional instruction sets are made up of more general instructions that allow them to perform a wider variety of operations, instruction sets optimized for digital signal processing contain instructions for common mathematical operations that occur frequently in DSP calculations. Both traditional and DSP-optimized instruction sets are able to compute any arbitrary operation but an operation that might require multiple ARM or x86 instructions to compute might require only one instruction in a DSP optimized instruction set. One implication for software architecture is that hand-optimized assembly-code routines are commonly packaged into libraries for re-use, instead of relying on advanced compiler technologies to handle essential algorithms.[clarification needed] Even with modern compiler optimizations hand-optimized assembly code is more efficient and many common algorithms involved in DSP calculations are hand-written in order to take full advantage of the architectural optimizations. Instruction sets[edit]

multiply–accumulates (MACs, including fused multiply–add, FMA) operations

used extensively in all kinds of matrix operations

convolution for filtering dot product polynomial evaluation

Fundamental DSP algorithms depend heavily on multiply–accumulate performance

FIR filters Fast Fourier transform
Fast Fourier transform
(FFT)

Instructions to increase parallelism:

SIMD VLIW superscalar architecture

Specialized instructions for modulo addressing in ring buffers and bit-reversed addressing mode for FFT cross-referencing Digital signal processors sometimes use time-stationary encoding to simplify hardware and increase coding efficiency. Multiple arithmetic units may require memory architectures to support several accesses per instruction cycle Special
Special
loop controls, such as architectural support for executing a few instruction words in a very tight loop without overhead for instruction fetches or exit testing[clarification needed]

Data instructions[edit]

Saturation arithmetic, in which operations that produce overflows will accumulate at the maximum (or minimum) values that the register can hold rather than wrapping around (maximum+1 doesn't overflow to minimum as in many general-purpose CPUs, instead it stays at maximum). Sometimes various sticky bits operation modes are available. Fixed-point arithmetic is often used to speed up arithmetic processing Single-cycle operations to increase the benefits of pipelining

Program flow[edit]

Floating-point
Floating-point
unit integrated directly into the datapath Pipelined architecture Highly parallel multiplier–accumulators (MAC units) Hardware-controlled looping, to reduce or eliminate the overhead required for looping operations

Hardware architecture[edit] In engineering, hardware architecture refers to the identification of a system's physical components and their interrelationships. This description, often called a hardware design model, allows hardware designers to understand how their components fit into a system architecture and provides to software component designers important information needed for software development and integration. Clear definition of a hardware architecture allows the various traditional engineering disciplines (e.g., electrical and mechanical engineering) to work more effectively together to develop and manufacture new machines, devices and components. Hardware is also an expression used within the computer engineering industry to explicitly distinguish the (electronic computer) hardware from the software that runs on it. But hardware, within the automation and software engineering disciplines, need not simply be a computer of some sort. A modern automobile runs vastly more software than the Apollo spacecraft. Also, modern aircraft cannot function without running tens of millions of computer instructions embedded and distributed throughout the aircraft and resident in both standard computer hardware and in specialized hardware components such as IC wired logic gates, analog and hybrid devices, and other digital components. The need to effectively model how separate physical components combine to form complex systems is important over a wide range of applications, including computers, personal digital assistants (PDAs), cell phones, surgical instrumentation, satellites, and submarines. Memory architecture[edit] DSPs are usually optimized for streaming data and use special memory architectures that are able to fetch multiple data or instructions at the same time, such as the Harvard architecture
Harvard architecture
or Modified von Neumann architecture, which use separate program and data memories (sometimes even concurrent access on multiple data buses). DSPs can sometimes rely on supporting code to know about cache hierarchies and the associated delays. This is a tradeoff that allows for better performance[clarification needed]. In addition, extensive use of DMA is employed. Addressing and virtual memory[edit] DSPs frequently use multi-tasking operating systems, but have no support for virtual memory or memory protection. Operating systems that use virtual memory require more time for context switching among processes, which increases latency.

Hardware modulo addressing

Allows circular buffers to be implemented without having to test for wrapping

Bit-reversed addressing, a special addressing mode

useful for calculating FFTs

Exclusion of a memory management unit Memory-address calculation unit

History[edit] Prior to the advent of stand-alone DSP chips discussed below, most DSP applications were implemented using bit-slice processors. The AMD 2901 bit-slice chip with its family of components was a very popular choice. There were reference designs from AMD, but very often the specifics of a particular design were application specific. These bit slice architectures would sometimes include a peripheral multiplier chip. Examples of these multipliers were a series from TRW including the TDC1008 and TDC1010, some of which included an accumulator, providing the requisite multiply–accumulate (MAC) function. In 1976, Richard Wiggins proposed the Speak & Spell concept to Paul Breedlove, Larry Brantingham, and Gene Frantz at Texas Instrument's Dallas research facility. Two years later in 1978 they produced the first Speak & Spell, with the technological centerpiece being the TMS5100,[5] the industry's first digital signal processor. It also set other milestones, being the first chip to use Linear predictive coding to perform speech synthesis.[6] In 1978, Intel released the 2920 as an "analog signal processor". It had an on-chip ADC/DAC with an internal signal processor, but it didn't have a hardware multiplier and was not successful in the market. In 1979, AMI released the S2811. It was designed as a microprocessor peripheral, and it had to be initialized by the host. The S2811 was likewise not successful in the market. In 1980 the first stand-alone, complete DSPs – the NEC µPD7720
NEC µPD7720
and AT&T DSP1 – were presented at the International Solid-State Circuits Conference '80. Both processors were inspired by the research in PSTN telecommunications. The Altamira DX-1 was another early DSP, utilizing quad integer pipelines with delayed branches and branch prediction.[citation needed] Another DSP produced by Texas Instruments
Texas Instruments
(TI), the TMS32010 presented in 1983, proved to be an even bigger success. It was based on the Harvard architecture, and so had separate instruction and data memory. It already had a special instruction set, with instructions like load-and-accumulate or multiply-and-accumulate. It could work on 16-bit numbers and needed 390 ns for a multiply–add operation. TI is now the market leader in general-purpose DSPs. About five years later, the second generation of DSPs began to spread. They had 3 memories for storing two operands simultaneously and included hardware to accelerate tight loops; they also had an addressing unit capable of loop-addressing. Some of them operated on 2 4-bit
4-bit
variables and a typical model only required about 21 ns for a MAC. Members of this generation were for example the AT&T DSP16A or the Motorola 56000. The main improvement in the third generation was the appearance of application-specific units and instructions in the data path, or sometimes as coprocessors. These units allowed direct hardware acceleration of very specific but complex mathematical problems, like the Fourier-transform or matrix operations. Some chips, like the Motorola MC68356, even included more than one processor core to work in parallel. Other DSPs from 1995 are the TI TMS320C541 or the TMS 320C80. The fourth generation is best characterized by the changes in the instruction set and the instruction encoding/decoding. SIMD
SIMD
extensions were added, and VLIW and the superscalar architecture appeared. As always, the clock-speeds have increased; a 3 ns MAC now became possible. Modern DSPs[edit] Modern signal processors yield greater performance; this is due in part to both technological and architectural advancements like lower design rules, fast-access two-level cache, (E)DMA circuitry and a wider bus system. Not all DSPs provide the same speed and many kinds of signal processors exist, each one of them being better suited for a specific task, ranging in price from about US$1.50 to US$300. Texas Instruments
Texas Instruments
produces the C6000 series DSPs, which have clock speeds of 1.2 GHz and implement separate instruction and data caches. They also have an 8 MiB 2nd level cache and 64 EDMA channels. The top models are capable of as many as 8000 MIPS (instructions per second), use VLIW (very long instruction word), perform eight operations per clock-cycle and are compatible with a broad range of external peripherals and various buses (PCI/serial/etc). TMS320C6474 chips each have three such DSPs, and the newest generation C6000 chips support floating point as well as fixed point processing. Freescale
Freescale
produces a multi-core DSP family, the MSC81xx. The MSC81xx is based on StarCore Architecture processors and the latest MSC8144 DSP combines four programmable SC3400 StarCore DSP cores. Each SC3400 StarCore DSP core has a clock speed of 1 GHz. XMOS
XMOS
produces a multi-core multi-threaded line of processor well suited to DSP operations, They come in various speeds ranging from 400 to 1600 MIPS. The processors have a multi-threaded architecture that allows up to 8 real-time threads per core, meaning that a 4 core device would support up to 32 real time threads. Threads communicate between each other with buffered channels that are capable of up to 80 Mbit/s. The devices are easily programmable in C and aim at bridging the gap between conventional micro-controllers and FPGAs CEVA, Inc.
CEVA, Inc.
produces and licenses three distinct families of DSPs. Perhaps the best known and most widely deployed is the CEVA-TeakLite DSP family, a classic memory-based architecture, with 16-bit or 32-bit word-widths and single or dual MACs. The CEVA-X DSP family offers a combination of VLIW and SIMD
SIMD
architectures, with different members of the family offering dual or quad 16-bit MACs. The CEVA-XC DSP family targets Software-defined Radio (SDR) modem designs and leverages a unique combination of VLIW and Vector architectures with 32 16-bit MACs. Analog Devices
Analog Devices
produce the SHARC-based DSP and range in performance from 66 MHz/198 MFLOPS (million floating-point operations per second) to 400 MHz/2400 MFLOPS. Some models support multiple multipliers and ALUs, SIMD
SIMD
instructions and audio processing-specific components and peripherals. The Blackfin
Blackfin
family of embedded digital signal processors combine the features of a DSP with those of a general use processor. As a result, these processors can run simple operating systems like μCLinux, velOSity and Nucleus RTOS while operating on real-time data. NXP Semiconductors
NXP Semiconductors
produce DSPs based on TriMedia VLIW technology, optimized for audio and video processing. In some products the DSP core is hidden as a fixed-function block into a SoC, but NXP also provides a range of flexible single core media processors. The TriMedia media processors support both fixed-point arithmetic as well as floating-point arithmetic, and have specific instructions to deal with complex filters and entropy coding. CSR produces the Quatro family of SoCs that contain one or more custom Imaging DSPs optimized for processing document image data for scanner and copier applications. Microchip Technology
Microchip Technology
produces the PIC24 based dsPIC line of DSPs. Introduced in 2004[7], the dsPIC is designed for applications needing a true DSP as well as a true microcontroller, such as motor control and in power supplies. The dsPIC runs at up to 40MIPS, and has support for 16 bit fixed point MAC, bit reverse and modulo addressing, as well as DMA. Most DSPs use fixed-point arithmetic, because in real world signal processing the additional range provided by floating point is not needed, and there is a large speed benefit and cost benefit due to reduced hardware complexity. Floating point DSPs may be invaluable in applications where a wide dynamic range is required. Product developers might also use floating point DSPs to reduce the cost and complexity of software development in exchange for more expensive hardware, since it is generally easier to implement algorithms in floating point. Generally, DSPs are dedicated integrated circuits; however DSP functionality can also be produced by using field-programmable gate array chips (FPGAs). Embedded general-purpose RISC processors are becoming increasingly DSP like in functionality. For example, the OMAP3 processors include a ARM Cortex-A8 and C6000 DSP. In Communications a new breed of DSPs offering the fusion of both DSP functions and H/W acceleration function is making its way into the mainstream. Such Modem processors include ASOCS ModemX and CEVA's XC4000. See also[edit]

Digital signal controller Graphics processing unit Video processing unit Vision processing unit MDSP – a multiprocessor DSP OpenCL

References[edit]

^ Dyer, S. A.; Harms, B. K. (1993). "Digital Signal Processing". In Yovits, M. C. Advances in Computers. 37. Academic Press. pp. 104–107. doi:10.1016/S0065-2458(08)60403-9. ISBN 9780120121373.  ^ Liptak, B. G. (2006). Process Control and Optimization. Instrument Engineers' Handbook. 2 (4th ed.). CRC Press. pp. 11–12. ISBN 9780849310812.  ^ a b Ingrid Verbauwhede; Patrick Schaumont; Christian Piguet; Bart Kienhuis (2005-12-24). "Architectures and Design techniques for energy efficient embedded DSP and multimedia processing" (PDF). rijndael.ece.vt.edu. Retrieved 2017-06-13.  ^ Beyond Frontiers
Beyond Frontiers
Broadgate Publications (September 2016) pp22 ^ "Speak & Spell, the First Use of a Digital Signal Processing IC for Speech Generation, 1978". IEEE Milestones. IEEE. Retrieved 2012-03-02.  ^ Bogdanowicz, A. (2009-10-06). "IEEE Milestones Honor Three". The Institute. IEEE. Retrieved 2012-03-02.  ^ "PIC microcontroller".. 2018-02-04. 

External links[edit]

DSP Online Book Pocket Guide to Processors for DSP - Berkeley Design Technology, INC

v t e

CPU technologies

Architecture

Turing machine Post–Turing machine Universal Turing machine Quantum Turing machine Belt machine Stack machine Register machine Counter machine Pointer machine Random access machine Random access stored program machine Finite-state machine Queue automaton Von Neumann Harvard (modified) Dataflow TTA Cellular Artificial neural network

Machine learning Deep learning Neural processing unit (NPU)

Convolutional neural network Load/store architecture Register memory architecture Endianness FIFO Zero-copy NUMA HUMA HSA Mobile computing Surface computing Wearable computing Heterogeneous computing Parallel computing Concurrent computing Distributed computing Cloud computing Amorphous computing Ubiquitous computing Fabric computing Cognitive computing Unconventional computing Hypercomputation Quantum computing Adiabatic quantum computing Linear optical quantum computing Reversible computing Reverse computation Reconfigurable computing Optical computing Ternary computer Analogous computing Mechanical computing Hybrid computing Digital computing DNA computing Peptide computing Chemical computing Organic computing Wetware computing Neuromorphic computing Symmetric multiprocessing
Symmetric multiprocessing
(SMP) Asymmetric multiprocessing
Asymmetric multiprocessing
(AMP) Cache hierarchy Memory hierarchy

ISA types

ASIP CISC RISC EDGE (TRIPS) VLIW (EPIC) MISC OISC NISC ZISC Comparison

ISAs

x86 z/Architecture ARM MIPS Power Architecture
Power Architecture
(PowerPC) SPARC Mill Itanium
Itanium
(IA-64) Alpha Prism SuperH V850 Clipper VAX Unicore PA-RISC MicroBlaze RISC-V

Word size

1-bit 2-bit 4-bit 8-bit 9-bit 10-bit 12-bit 15-bit 16-bit 18-bit 22-bit 24-bit 25-bit 26-bit 27-bit 31-bit 32-bit 33-bit 34-bit 36-bit 39-bit 40-bit 48-bit 50-bit 60-bit 64-bit 128-bit 256-bit 512-bit Variable

Execution

Instruction pipelining

Bubble Operand forwarding

Out-of-order execution

Register renaming

Speculative execution

Branch predictor Memory dependence prediction

Hazards

Parallel level

Bit

Bit-serial Word

Instruction Pipelining

Scalar Superscalar

Task

Thread Process

Data

Vector

Memory

Multithreading

Temporal Simultaneous (SMT) (Hyper-threading) Speculative (SpMT) Preemptive Cooperative Clustered Multi-Thread (CMT) Hardware scout

Flynn's taxonomy

SISD SIMD
SIMD
(SWAR) SIMT MISD MIMD

SPMD

Addressing mode

CPU performance

Instructions per second (IPS) Instructions per clock (IPC) Cycles per instruction (CPI) Floating-point
Floating-point
operations per second (FLOPS) Transactions per second (TPS) Synaptic Updates Per Second (SUPS) Performance per watt Orders of magnitude (computing) Cache performance measurement and metric

Core count

Single-core processor Multi-core processor Manycore processor

Types

Central processing unit
Central processing unit
(CPU) GPGPU AI accelerator Vision processing unit (VPU) Vector processor Barrel processor Stream processor Digital signal processor
Digital signal processor
(DSP) I/O processor/DMA controller Network processor Baseband processor Physics processing unit
Physics processing unit
(PPU) Coprocessor Secure cryptoprocessor ASIC FPGA FPOA CPLD Microcontroller Microprocessor Mobile processor Notebook processor Ultra-low-voltage processor Multi-core processor Manycore processor Tile processor Multi-chip module
Multi-chip module
(MCM) Chip stack multi-chip modules System on a chip
System on a chip
(SoC) Multiprocessor system-on-chip (MPSoC) Programmable System-on-Chip
System-on-Chip
(PSoC) Network on a chip (NoC)

Components

Execution unit (EU) Arithmetic logic unit
Arithmetic logic unit
(ALU) Address generation unit
Address generation unit
(AGU) Floating-point
Floating-point
unit (FPU) Load-store unit (LSU) Branch predictor Unified Reservation Station Barrel shifter Uncore Sum addressed decoder (SAD) Front-side bus Back-side bus Northbridge (computing) Southbridge (computing) Adder (electronics) Binary multiplier Binary decoder Address decoder Multiplexer Demultiplexer Registers Cache Memory management unit
Memory management unit
(MMU) Input–output memory management unit
Input–output memory management unit
(IOMMU) Integrated Memory Controller (IMC) Power Management Unit (PMU) Translation lookaside buffer
Translation lookaside buffer
(TLB) Stack engine Register file Processor register Hardware register Memory buffer register (MBR) Program counter Microcode
Microcode
ROM Datapath Control unit Instruction unit Re-order buffer Data buffer Write buffer Coprocessor Electronic switch Electronic circuit Integrated circuit Three-dimensional integrated circuit Boolean circuit Digital circuit Analog circuit Mixed-signal integrated circuit Power management integrated circuit Quantum circuit Logic gate

Combinational logic Sequential logic Emitter-coupled logic
Emitter-coupled logic
(ECL) Transistor–transistor logic
Transistor–transistor logic
(TTL) Glue logic

Quantum gate Gate array Counter (digital) Bus (computing) Semiconductor device Clock rate CPU multiplier Vision chip Memristor

Power management

APM ACPI Dynamic frequency scaling Dynamic voltage scaling Clock gating

Hardware security

Non-executable memory (NX bit) Memory Protection Extensions (Intel MPX) Intel Secure Key Hardware restriction (firmware) Software Guard Extensions (Intel SGX) Trusted Execution Technology Trusted Platform Module
Trusted Platform Module
(TPM) Secure cryptoprocessor Hardware security module Hengzhi chip

Related

History of general-purpose CPUs

Authority control

.