digital signal processing Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner ar ...

(DSP), parallel processing is a technique duplicating function units to operate different tasks (signals) simultaneously. Accordingly, we can perform the same processing for different signals on the corresponding duplicated function units. Further, due to the features of parallel processing, the parallel DSP design often contains multiple outputs, resulting in higher throughput than not parallel.

Conceptual example

Consider a function unit (

F_0

) and three tasks (

T_0

T_1

, and

T_2

). The required time for the function unit

F_0

to process those tasks is

t_0

t_1

, and

t_2

, respectively. Then, if we operate these three tasks in a sequential order, the required time to complete them is

t_0 + t_1 + t_2

However, if we duplicate the function unit to another two copies (

F

), the aggregate time is reduced to

max(t_0,t_1,t_2)

, which is smaller than in a sequential order.
Parallel-tasks

Versus pipelining

Mechanism: *Parallel: duplicated function units working in parallel **Each task is processed entirely by a different function unit. *

Pipelining Pipelining may refer to: * Pipeline (computing), aka a data pipeline, a set of data processing elements connected in series ** HTTP pipelining, a technique in which multiple HTTP requests are sent on a single TCP connection ** Instruction pipeli ...

: different function units working in parallel **Each task is split into a sequence of sub-tasks, which are handled by specialized and different function units. Objective: *Pipelining leads to a reduction in the critical path, which can increase the sample speed or reduce

power consumption Electric energy consumption is the form of energy consumption that uses electrical energy. Electric energy consumption is the actual energy demand made on existing electricity supply for transportation, residential, industrial, commercial, and ot ...

at the same speed, yielding higher

performance per watt In computing, performance per watt is a measure of the energy efficiency of a particular computer architecture or computer hardware. Literally, it measures the rate of computation that can be delivered by a computer for every watt of power consu ...

. *Parallel processing techniques require multiple outputs, which are computed in parallel in a

clock period In computing, the clock rate or clock speed typically refers to the frequency at which the clock generator of a processor can generate pulses, which are used to synchronize the operations of its components, and is used as an indicator of the pr ...

. Therefore, the effective sample speed is increased by the level of parallelism. Consider a condition that we are able to apply both parallel processing and pipelining techniques, it is better to choose parallel processing techniques with the following reasons *Pipelining usually causes I/O bottlenecks *Parallel processing is also utilized for reduction of power consumption while using slow clocks *The hybrid method of pipelining and parallel processing further increase the speed of the architecture

Parallel FIR filters

Consider a 3-tap FIR filter:Slides for VLSI Digital Signal Processing Systems: Design and Implementation John Wiley & Sons, 1999 (ISBN Number: 0-471-24186-5): http://people.ece.umn.edu/~parhi/publications/books/ :

y(n)=ax(n)+bx(n-1)+cx(n-2)

which is shown in the following figure. Assume the calculation time for multiplication units is T_m and T_a for add units. The sample period is given by :

T_\text \ge T_m + 2T_a

By parallelizing it, the resultant architecture is shown as follows. The sample rate now becomes :

T_\text \ge \frac = \frac

where N represents the number of copies. Please note that, in a parallel system,

T_\text \ne T_\text

while

T_\text=T_\text

holds in a pipelined system. Parallel FIR

Parallel 1st-order IIR filters

Consider the transfer function of a 1st-order IIR filter formulated as :

H(z)=\frac

where , ''a'', ≤ 1 for stability, and such filter has only one pole located at ''z'' = ''a''; The corresponding recursive representation is :

y(n+1) = ay(n) + u(n)

Consider the design of a 4-parallel architecture (''N'' = 4). In such parallel system, each delay element means a block delay and the clock period is four times the sample period. Therefore, by iterating the recursion with ''n'' = 4''k'', we have :

y(n+4) = a^4 y(n) + a^3 u(n) + a^2 u(n+1) + au(n+2) + u(n+3)

\rightarrow y(4k+4) = a^y(4k) + a^u(4k) + a^u(4k+1) + au(4k+2) + u(4k+3)

The corresponding architecture is shown as follows. Parallel IIR

The resultant parallel design has the following properties. * The pole of the original filter is at ''z'' = ''a'' while the pole for the parallel system is at ''z'' = ''a''⁴ which is closer to the origin. * The pole movement improves the robustness of the system to the round-off noise. * Hardware complexity of this architecture: ''N''×''N'' multiply-add operations. The square increase in hardware complexity can be reduced by exploiting the concurrency and the incremental computation to avoid repeated computing.

Parallel processing for low power

Another advantage for the parallel processing techniques is that it can reduce the power consumption of a system by reducing the supply voltage. Consider the following power consumption in a normal CMOS circuit. :

P_\text=C_\text \cdot V_0^2 \cdot f

where the ''C''_total represents the total capacitance of the CMOS circuit. For a parallel version, the charging capacitance remains the same but the total capacitance increases by ''N'' times. In order to maintain the same sample rate, the clock period of the ''N''-parallel circuit increases to ''N'' times the propagation delay of the original circuit. It makes the charging time prolongs ''N'' times. The supply voltage can be reduced to ''βV''₀. Therefore, the power consumption of the N-parallel system can be formulated as :

P_\text=(NC_\text)\cdot(\beta V_0^2)\cdot\frac=\beta^2 P_\text{seq}

where ''β'' can be computed by :

N(\beta V_0-V_t)^2 = \beta(V_0-V_t)^2. \,

References

Digital signal processing Parallel computing