Code-excited linear prediction (CELP) is a linear predictive

speech coding Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic da ...

algorithm originally proposed by Manfred R. Schroeder and Bishnu S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algorithms, such as residual-excited linear prediction (RELP) and

linear predictive coding Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model ...

(LPC) vocoders (e.g., FS-1015). Along with its variants, such as algebraic CELP, relaxed CELP, low-delay CELP and vector sum excited linear prediction, it is currently the most widely used speech coding algorithm. It is also used in MPEG-4 Audio speech coding. CELP is commonly used as a generic term for a class of algorithms and not for a particular codec.

Background

The CELP algorithm is based on four main ideas: * Using the source-filter model of speech production through linear prediction (LP) (see the textbook "speech coding algorithm"); * Using an adaptive and a fixed codebook as the input (excitation) of the LP model; * Performing a search in closed-loop in a "perceptually weighted domain". * Applying

vector quantization Vector quantization (VQ) is a classical quantization technique from signal processing that allows the modeling of probability density functions by the distribution of prototype vectors. Developed in the early 1980s by Robert M. Gray, it was ori ...

(VQ) The original algorithm as simulated in 1983 by Schroeder and Atal required 150 seconds to encode 1 second of speech when run on a

Cray-1 The Cray-1 was a supercomputer designed, manufactured and marketed by Cray Research. Announced in 1975, the first Cray-1 system was installed at Los Alamos National Laboratory in 1976. Eventually, eighty Cray-1s were sold, making it one of the ...

supercomputer. Since then, more efficient ways of implementing the codebooks and improvements in computing capabilities have made it possible to run the algorithm in embedded devices, such as mobile phones.

CELP decoder

Before exploring the complex encoding process of CELP we introduce the decoder here. Figure 1 describes a generic CELP decoder. The excitation is produced by summing the contributions from fixed (a.k.a. stochastic or innovation) and adaptive (a.k.a. pitch) codebooks: :

e e_f e_a,

where

e_/math> is the fixed (a.k.a. stochastic or innovation) codebook contribution  and e_/math> is the adaptive (pitch) codebook contribution. The fixed codebook is a

dictionary that is (implicitly or explicitly) hard-coded into the codec. This codebook can be algebraic ( ACELP) or be stored explicitly (e.g.

Speex {{More citations needed, date=May 2025 The Speex project is an attempt to create a free software speech codec, unencumbered by patent restrictions. Speex is licensed under the BSD License and is used with the Xiph.org Foundation's Ogg containe ...

). The entries in the adaptive codebook consist of delayed versions of the excitation. This makes it possible to efficiently code periodic signals, such as voiced sounds. The filter that shapes the excitation has an all-pole model of the form

1/A(z)

, where

A(z)

is called the prediction filter and is obtained using linear prediction ( Levinson–Durbin algorithm). An all-pole filter is used because it is a good representation of the human vocal tract and because it is easy to compute.

CELP encoder

The main principle behind CELP is called analysis-by-synthesis (AbS) and means that the encoding (analysis) is performed by perceptually optimizing the decoded (synthesis) signal in a closed loop. In theory, the best CELP stream would be produced by trying all possible bit combinations and selecting the one that produces the best-sounding decoded signal. This is obviously not possible in practice for two reasons: the required complexity is beyond any currently available hardware and the “best sounding” selection criterion implies a human listener. In order to achieve real-time encoding using limited computing resources, the CELP search is broken down into smaller, more manageable, sequential searches using a simple perceptual weighting function. Typically, the encoding is performed in the following order: * Linear prediction coefficients (LPC) are computed and quantized, usually as

line spectral pairs Line spectral pairs (LSP) or line spectral frequencies (LSF) are used to represent linear predictive coding, linear prediction coefficients (LPC) for transmission over a channel. LSPs have several properties (e.g. smaller sensitivity to quantizatio ...

(LSPs). * The adaptive (pitch) codebook is searched and its contribution removed. * The fixed (innovation) codebook is searched.

Noise weighting

Most (if not all) modern audio codecs attempt to shape the coding noise so that it appears mostly in the frequency regions where the ear cannot detect it. For example, the ear is more tolerant to noise in parts of the spectrum that are louder and vice versa. That's why instead of minimizing the simple quadratic error, CELP minimizes the error for the ''perceptually weighted'' domain. The weighting filter W(z) is typically derived from the LPC filter by the use of

bandwidth expansion Bandwidth commonly refers to: * Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range * Bandwidth (computing), the rate of data transfer, bit rate or thr ...

: :

W(z) = \frac

where

\gamma_1 > \gamma_2

References

* B.S. Atal, "The History of Linear Prediction," ''IEEE Signal Processing Magazine'', vol. 23, no. 2, March 2006, pp. 154–161. * M. R. Schroeder and B. S. Atal, "Code-excited linear prediction (CELP): high-quality speech at very low bit rates," in ''Proceedings of the IEEE

International Conference on Acoustics, Speech, and Signal Processing ICASSP, the International Conference on Acoustics, Speech, and Signal Processing, is an annual flagship conference organized by IEEE Signal Processing Society. Ei Compendex has indexed all papers included in its proceedings. The first ICASSP w ...

'' (ICASSP), vol. 10, pp. 937–940, 1985.

External links

* This article is based on
paper
presented a
Linux.Conf.Au
* Some parts based on the

code
manual

of CELP 1016A (CELP 3.2a) and LPC 10e.

Selected readings

Speech Processing: Theory of LPC Analysis and Synthesis
{{Compression formats Speech codecs Data compression