The modified discrete cosine transform (MDCT) is a transform based on the type-IV discrete cosine transform (DCT-IV), with the additional property of being lapped: it is designed to be performed on consecutive blocks of a larger dataset, where subsequent blocks are overlapped so that the last half of one block coincides with the first half of the next block. This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it helps to avoid artifacts stemming from the block boundaries. As a result of these advantages, the MDCT is the most widely used

lossy compression In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data si ...

technique in

audio data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...

. It is employed in most modern audio coding standards, including

MP3 MP3 (formally MPEG-1 Audio Layer III or MPEG-2 Audio Layer III) is a coding format for digital audio developed largely by the Fraunhofer Society in Germany, with support from other digital scientists in the United States and elsewhere. Origin ...

Dolby Digital Dolby Digital, originally synonymous with Dolby AC-3, is the name for what has now become a family of audio compression technologies developed by Dolby Laboratories. Formerly named Dolby Stereo Digital until 1995, the audio compression is loss ...

(AC-3),

Vorbis Vorbis is a free and open-source software project headed by the Xiph.Org Foundation. The project produces an audio coding format and software reference encoder/decoder ( codec) for lossy audio compression. Vorbis is most commonly used in con ...

(Ogg),

Windows Media Audio Windows Media Audio (WMA) is a series of audio codecs and their corresponding audio coding formats developed by Microsoft. It is a proprietary technology that forms part of the Windows Media framework. WMA consists of four distinct codecs. The ...

(WMA),

ATRAC Adaptive Transform Acoustic Coding (ATRAC) is a family of proprietary audio compression algorithms developed by Sony. MiniDisc was the first commercial product to incorporate ATRAC in 1992. ATRAC allowed a relatively small disc like MiniDisc to h ...

Cook Cook or The Cook may refer to: Food preparation * Cooking, the preparation of food * Cook (domestic worker), a household staff member who prepares food * Cook (professional), an individual who prepares food for consumption in the food industry * ...

Advanced Audio Coding Advanced Audio Coding (AAC) is an audio coding standard for lossy digital audio compression. Designed to be the successor of the MP3 format, AAC generally achieves higher sound quality than MP3 encoders at the same bit rate. AAC has been stan ...

(AAC), High-Definition Coding (HDC), LDAC,

Dolby AC-4 Dolby AC-4 is an audio compression technology developed by Dolby Laboratories. Dolby AC-4 bitstreams can contain audio channels and/or audio objects. Dolby AC-4 has been adopted by the DVB project and standardized by the ETSI. History Its develop ...

, and

MPEG-H 3D Audio MPEG-H 3D Audio, specified as ISO/IEC 23008-3 (MPEG-H Part 3), is an audio coding standard developed by the ISO/IEC Moving Picture Experts Group (MPEG) to support coding audio as audio channels, audio objects, or higher order ambisonics (HOA). MP ...

, as well as

speech coding Speech coding is an application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic d ...

standards such as AAC-LD (LD-MDCT), G.722.1,

G.729.1 G.729.1 is an 8-32 kbit/s embedded speech and audio codec providing bitstream interoperability with G.729, G.729 Annex A and G.729 Annex B. Its official name is ''G.729-based embedded variable bit rate codec: An 8-32 kbit/s scalable wideband code ...

CELT The Celts (, see pronunciation for different usages) or Celtic peoples () are. "CELTS location: Greater Europe time period: Second millennium B.C.E. to present ancestry: Celtic a collection of Indo-European peoples. "The Celts, an ancient ...

,Presentation of the CELT codec
by Timothy B. Terriberry (65 minutes of video, see als
presentation slides
in PDF) and

Opus ''Opus'' (pl. ''opera'') is a Latin word meaning "work". Italian equivalents are ''opera'' (singular) and ''opere'' (pl.). Opus or OPUS may refer to: Arts and entertainment Music * Opus number, (abbr. Op.) specifying order of (usually) publicatio ...

. The discrete cosine transform (DCT) was first proposed by Nasir Ahmed in 1972, and demonstrated by Ahmed with T. Natarajan and

K. R. Rao Kamisetty Ramamohan Rao was an Indian-American electrical engineer. He was a professor of Electrical Engineering at the University of Texas at Arlington (UT Arlington). Academically known as K. R. Rao, he is credited with the co-invention of di ...

in 1974. The MDCT was later proposed by John P. Princen, A.W. Johnson and Alan B. Bradley at the

University of Surrey The University of Surrey is a public research university in Guildford, Surrey, England. The university received its royal charter in 1966, along with a number of other institutions following recommendations in the Robbins Report. The institu ...

in 1987, following earlier work by Princen and Bradley (1986) to develop the MDCT's underlying principle of time-domain aliasing cancellation (TDAC), described below. (There also exists an analogous transform, the MDST, based on the

discrete sine transform In mathematics, the discrete sine transform (DST) is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using a purely real matrix. It is equivalent to the imaginary parts of a DFT of roughly twice the length, operati ...

, as well as other, rarely used, forms of the MDCT based on different types of DCT or DCT/DST combinations.) In MP3, the MDCT is not applied to the audio signal directly, but rather to the output of a 32-band polyphase quadrature filter (PQF) bank. The output of this MDCT is postprocessed by an alias reduction formula to reduce the typical aliasing of the PQF filter bank. Such a combination of a filter bank with an MDCT is called a ''hybrid'' filter bank or a ''subband'' MDCT. AAC, on the other hand, normally uses a pure MDCT; only the (rarely used)

MPEG-4 AAC-SSR MPEG-4 Part 3 or MPEG-4 Audio (formally ISO/IEC 14496-3) is the third part of the ISO/IEC MPEG-4 international standard developed by Moving Picture Experts Group. It specifies audio coding methods. The first version of ISO/IEC 14496-3 was published ...

variant (by

Sony , commonly stylized as SONY, is a Japanese multinational conglomerate corporation headquartered in Minato, Tokyo, Japan. As a major technology company, it operates as one of the world's largest manufacturers of consumer and professional ...

) uses a four-band PQF bank followed by an MDCT. Similar to MP3,

uses stacked quadrature mirror filters (QMF) followed by an MDCT.

Definition

As a lapped transform, the MDCT is a bit unusual compared to other Fourier-related transforms in that it has half as many outputs as inputs (instead of the same number). In particular, it is a

linear function In mathematics, the term linear function refers to two distinct but related notions: * In calculus and related areas, a linear function is a function whose graph is a straight line, that is, a polynomial function of degree zero or one. For dist ...

F\colon \mathbf^ \to \mathbf^N

(where R denotes the set of

real number In mathematics, a real number is a number that can be used to measure a ''continuous'' one-dimensional quantity such as a distance, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small variations. Every ...

s). The 2''N'' real numbers ''x''₀, ..., ''x''_2''N''-1 are transformed into the ''N'' real numbers ''X''₀, ..., ''X''_''N''-1 according to the formula: :

X_k = \sum_^ x_n \cos \left frac \left(n+\frac+\frac\right) \left(k+\frac\right) \right /math>

(The normalization coefficient in front of this transform, here unity, is an arbitrary convention and differs between treatments.  Only the product of the normalizations of the MDCT and the IMDCT, below, is constrained.)

Inverse transform

The inverse MDCT is known as the IMDCT. Because there are different numbers of inputs and outputs, at first glance it might seem that the MDCT should not be invertible. However, perfect invertibility is achieved by ''adding'' the overlapped IMDCTs of subsequent overlapping blocks, causing the errors to ''cancel'' and the original data to be retrieved; this technique is known as ''time-domain aliasing cancellation'' (TDAC). The IMDCT transforms ''N'' real numbers ''X''₀, ..., ''X''_''N''-1 into 2''N'' real numbers ''y''₀, ..., ''y''_2''N''-1 according to the formula: :

y_n = \frac \sum_^ X_k \cos \left frac \left(n+\frac+\frac\right) \left(k+\frac\right) \right /math>

(Like for the DCT-IV, an orthogonal transform, the inverse has the same form as the forward transform.)

In the case of a windowed MDCT with the usual window normalization (see below), the normalization coefficient in front of the IMDCT should be multiplied by 2 (i.e., becoming 2/''N'').

Computation

Although the direct application of the MDCT formula would require O(''N''²) operations, it is possible to compute the same thing with only O(''N'' log ''N'') complexity by recursively factorizing the computation, as in the

fast Fourier transform A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). Fourier analysis converts a signal from its original domain (often time or space) to a representation in ...

(FFT). One can also compute MDCTs via other transforms, typically a DFT (FFT) or a DCT, combined with O(''N'') pre- and post-processing steps. Also, as described below, any algorithm for the DCT-IV immediately provides a method to compute the MDCT and IMDCT of even size.

Window functions

upright=1.8, MDCT window functions:blue: Cosine, red: Sine-Cosine, green: modified Kaiser-Bessel In typical signal-compression applications, the transform properties are further improved by using a

window function In signal processing and statistics, a window function (also known as an apodization function or tapering function) is a mathematical function that is zero-valued outside of some chosen interval, normally symmetric around the middle of the int ...

''w''_''n'' (''n'' = 0, ..., 2''N''−1) that is multiplied with ''x''_''n'' and ''y''_''n'' in the MDCT and IMDCT formulas, above, in order to avoid discontinuities at the ''n'' = 0 and 2''N'' boundaries by making the function go smoothly to zero at those points. (That is, we window the data ''before'' the MDCT and ''after'' the IMDCT.) In principle, ''x'' and ''y'' could have different window functions, and the window function could also change from one block to the next (especially for the case where data blocks of different sizes are combined), but for simplicity we consider the common case of identical window functions for equal-sized blocks. The transform remains invertible (that is, TDAC works), for a symmetric window ''w''_''n'' = ''w''_{2''N''−1−''n''}, as long as ''w'' satisfies the Princen-Bradley condition: :

w_n^2 + w_^2 = 1

. Various window functions are used. A window that produces a form known as a modulated lapped transform (MLT)H. S. Malvar, "Modulated QMF Filter Banks with Perfect Reconstruction", ''Electronics Letters'', vol. 26, no. 13, pp. 906–907 (Equation 13), June 1990. is given by :

w_n = \sin \left frac \left(n+\frac\right) \right /math>

and is used for MP3 and MPEG-2 AAC, and 

: w_n = \sin \left( \frac \sin^2 \left frac \left(n+\frac\right) \right \right) for Vorbis.  AC-3 uses a Kaiser-Bessel derived (KBD) window, and MPEG-4 AAC can also use a KBD window.

Note that windows applied to the MDCT are different from windows used for some other types of signal analysis, since they must fulfill the Princen-Bradley condition. One of the reasons for this difference is that MDCT windows are applied twice, for both the MDCT (analysis) and the IMDCT (synthesis).

Relationship to DCT-IV and Origin of TDAC

As can be seen by inspection of the definitions, for even ''N'' the MDCT is essentially equivalent to a DCT-IV, where the input is shifted by ''N''/2 and two ''N''-blocks of data are transformed at once. By examining this equivalence more carefully, important properties like TDAC can be easily derived. In order to define the precise relationship to the DCT-IV, one must realize that the DCT-IV corresponds to alternating even/odd boundary conditions: even at its left boundary (around ''n'' = −1/2), odd at its right boundary (around ''n'' = ''N'' − 1/2), and so on (instead of periodic boundaries as for a DFT). This follows from the identities

\cos\left frac \left(-n-1+\frac\right) \left(k+\frac\right)\right = \cos\left frac \left(n+\frac\right) \left(k+\frac\right)\right /math> and \cos\left frac \left(2N-n-1+\frac\right) \left(k+\frac\right)\right = -\cos\left frac \left(n+\frac\right) \left(k+\frac\right)\right /math>.  Thus, if its inputs are an array ''x'' of length ''N'', we can imagine extending this array to (''x'', −''x''

_''R'', −''x'', ''x''_''R'', ...) and so on, where ''x''_''R'' denotes ''x'' in reverse order. Consider an MDCT with 2''N'' inputs and ''N'' outputs, where we divide the inputs into four blocks (''a'', ''b'', ''c'', ''d'') each of size ''N''/2. If we shift these to the right by ''N''/2 (from the +''N''/2 term in the MDCT definition), then (''b'', ''c'', ''d'') extend past the end of the ''N'' DCT-IV inputs, so we must "fold" them back according to the boundary conditions described above. :Thus, the MDCT of 2''N'' inputs (''a'', ''b'', ''c'', ''d'') is ''exactly'' equivalent to a DCT-IV of the ''N'' inputs: (−''c''_''R''−''d'', ''a''−''b''_''R''), where ''R'' denotes reversal as above. (In this way, any algorithm to compute the DCT-IV can be trivially applied to the MDCT.) Similarly, the IMDCT formula above is precisely 1/2 of the DCT-IV (which is its own inverse), where the output is extended (via the boundary conditions) to a length 2''N'' and shifted back to the left by ''N''/2. The inverse DCT-IV would simply give back the inputs (−''c''_''R''−''d'', ''a''−''b''_''R'') from above. When this is extended via the boundary conditions and shifted, one obtains: :IMDCT (MDCT (''a'', ''b'', ''c'', ''d'')) = (''a''−''b''_''R'', ''b''−''a''_''R'', ''c''+''d''_''R'', ''d''+''c''_''R'') / 2. Half of the IMDCT outputs are thus redundant, as ''b''−''a''_''R'' = −(''a''−''b''_''R'')_''R'', and likewise for the last two terms. If we group the input into bigger blocks ''A'',''B'' of size ''N'', where ''A'' = (''a'', ''b'') and ''B'' = (''c'', ''d''), we can write this result in a simpler way: :IMDCT (MDCT (''A'', ''B'')) = (''A''−''A''_''R'', ''B''+''B''_''R'') / 2 One can now understand how TDAC works. Suppose that one computes the MDCT of the subsequent, 50% overlapped, 2''N'' block (''B'', ''C''). The IMDCT will then yield, analogous to the above: (''B''−''B''_''R'', ''C''+''C''_''R'') / 2. When this is added with the previous IMDCT result in the overlapping half, the reversed terms cancel and one obtains simply ''B'', recovering the original data.

Origin of TDAC

The origin of the term "time-domain aliasing cancellation" is now clear. The use of input data that extend beyond the boundaries of the logical DCT-IV causes the data to be ''aliased'' in the same way that frequencies beyond the

Nyquist frequency In signal processing, the Nyquist frequency (or folding frequency), named after Harry Nyquist, is a characteristic of a sampler, which converts a continuous function or signal into a discrete sequence. In units of cycles per second ( Hz), it ...

are aliased to lower frequencies, except that this aliasing occurs in the time domain instead of the frequency domain: we cannot distinguish the contributions of ''a'' and of ''b''_''R'' to the MDCT of (''a'', ''b'', ''c'', ''d''), or equivalently, to the result of :IMDCT (MDCT (''a'', ''b'', ''c'', ''d'')) = (''a''−''b''_''R'', ''b''−''a''_''R'', ''c''+''d''_''R'', ''d''+''c''_''R'') / 2. The combinations ''c''−''d''_''R'' and so on, have precisely the right signs for the combinations to cancel when they are added. For odd ''N'' (which are rarely used in practice), ''N''/2 is not an integer so the MDCT is not simply a shift permutation of a DCT-IV. In this case, the additional shift by half a sample means that the MDCT/IMDCT becomes equivalent to the DCT-III/II, and the analysis is analogous to the above.

Smoothness and discontinuities

We have seen above that the MDCT of 2''N'' inputs (''a'', ''b'', ''c'', ''d'') is equivalent to a DCT-IV of the ''N'' inputs (−''c''_''R''−''d'', ''a''−''b''_''R''). The DCT-IV is designed for the case where the function at the right boundary is odd, and therefore the values near the right boundary are close to 0. If the input signal is smooth, this is the case: the rightmost components of ''a'' and ''b''_''R'' are consecutive in the input sequence (''a'', ''b'', ''c'', ''d''), and therefore their difference is small. Let us look at the middle of the interval: if we rewrite the above expression as (−''c''_''R''−''d'', ''a''−''b''_''R'') = (−''d'', ''a'')−(''b'',''c'')_''R'', the second term, (''b'',''c'')_''R'', gives a smooth transition in the middle. However, in the first term, (−''d'', ''a''), there is a potential discontinuity where the right end of −''d'' meets the left end of ''a''. This is the reason for using a window function that reduces the components near the boundaries of the input sequence (''a'', ''b'', ''c'', ''d'') towards 0.

TDAC for the windowed MDCT

Above, the TDAC property was proved for the ordinary MDCT, showing that adding IMDCTs of subsequent blocks in their overlapping half recovers the original data. The derivation of this inverse property for the windowed MDCT is only slightly more complicated. Consider to overlapping consecutive sets of 2''N'' inputs (''A'',''B'') and (''B'',''C''), for blocks ''A'',''B'',''C'' of size ''N''. Recall from above that when

(A,B)

and

(B,C)

are MDCTed, IMDCTed, and added in their overlapping half, we obtain

(B+B_R) / 2 + (B-B_R) / 2 = B

, the original data. Now we suppose that we multiply ''both'' the MDCT inputs ''and'' the IMDCT outputs by a window function of length 2''N''. As above, we assume a symmetric window function, which is therefore of the form

(W,W_R)

where ''W'' is a length-''N'' vector and ''R'' denotes reversal as before. Then the Princen-Bradley condition can be written as

W^2 + W_R^2 = (1,1,\ldots)

, with the squares and additions performed elementwise. Therefore, instead of MDCTing

(A,B)

, we now MDCT

(WA,W_R B)

(with all multiplications performed elementwise). When this is IMDCTed and multiplied again (elementwise) by the window function, the last-''N'' half becomes: :

W_R \cdot (W_R B+(W_R B)_R) =W_R \cdot (W_R B+W B_R) = W_R^2 B+WW_R B_R

. (Note that we no longer have the multiplication by 1/2, because the IMDCT normalization differs by a factor of 2 in the windowed case.) Similarly, the windowed MDCT and IMDCT of

(B,C)

yields, in its first-''N'' half: :

W \cdot (WB - W_R B_R) = W^2 B - W W_R B_R

. When we add these two halves together, we obtain: :

(W_R^2 B+WW_R B_R) + (W^2 B - W W_R B_R)= \left(W_R^2 + W^2\right)B = B,

recovering the original data.

References

Bibliography

* Henrique S. Malvar, ''Signal Processing with Lapped Transforms'' (Artech House: Norwood MA, 1992). * A. W. Johnson and A. B. Bradley, "Adaptive transform coding incorporating time domain aliasing cancellation," ''Speech Comm.'' 6, 299-308 (1987). * For algorithms, see examples: ** Chi-Min Liu and Wen-Chieh Lee,
A unified fast algorithm for cosine modulated filterbanks in current audio standards
, ''J. Audio Engineering'' 47 (12), 1061-1075 (1999). ** V. Britanak and K. R. Rao, "A new fast algorithm for the unified forward and inverse MDCT/MDST computation," ''Signal Processing'' 82, 433-459 (2002) ** Vladimir Nikolajevic and Gerhard Fettweis, "Computation of forward and inverse MDCT using Clenshaw's recurrence formula," ''IEEE Trans. Sig. Proc.'' 51 (5), 1439-1444 (2003) ** Che-Hong Chen, Bin-Da Liu, and Jar-Ferr Yang, "Recursive architectures for realizing modified discrete cosine transform and its inverse," ''IEEE Trans. Circuits Syst. II: Analog Dig. Sig. Proc.'' 50 (1), 38-45 (2003) ** J.S. Wu, H.Z. Shu, L. Senhadji, and L.M. Luo, "Mixed-radix algorithm for the computation of forward and inverse MDCTs," ''IEEE Trans. Circuits Syst. I: Reg. Papers'' 56 (4), 784-794 (2009) ** V. Britanak, "A survey of efficient MDCT implementations in MP3 audio coding standard: retrospective and state-of-the-art," ''Signal. Process.'' 91 (4), 624-672(2011) {{Compression Methods Fourier analysis Discrete transforms Data compression