The modified discrete cosine transform (MDCT) is a transform based on the type-IV
discrete cosine transform
A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequency, frequencies. The DCT, first proposed by Nasir Ahmed (engineer), Nasir Ahmed in 1972, is a widely ...
(DCT-IV), with the additional property of being
lapped: it is designed to be performed on consecutive blocks of a larger
dataset, where subsequent blocks are overlapped so that the last half of one block coincides with the first half of the next block. This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it helps to avoid
artifacts stemming from the block boundaries. As a result of these advantages, the MDCT is the most widely used
lossy compression
In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...
technique in
audio data compression
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...
. It is employed in most modern
audio coding standards, including
MP3
MP3 (formally MPEG-1 Audio Layer III or MPEG-2 Audio Layer III) is a coding format for digital audio developed largely by the Fraunhofer Society in Germany under the lead of Karlheinz Brandenburg. It was designed to greatly reduce the amount ...
,
Dolby Digital
Dolby Digital, originally synonymous with Dolby AC-3 (see below), is the name for a family of audio compression technologies developed by Dolby Laboratories. Called Dolby Stereo Digital until 1995, it is lossy compression (except for Dolby Tr ...
(AC-3),
Vorbis
Vorbis is a free and open-source software project headed by the Xiph.Org Foundation. The project produces an audio coding format and software reference encoder/decoder ( codec) for lossy audio compression, libvorbis. Vorbis is most comm ...
(Ogg),
Windows Media Audio
Windows Media Audio (WMA) is a series of audio codecs and their corresponding audio coding formats developed by Microsoft. It is a proprietary technology that forms part of the Windows Media framework. Audio encoded in WMA is stored in a digi ...
(WMA),
ATRAC
Adaptive Transform Acoustic Coding (ATRAC) is a family of proprietary audio compression algorithms developed by Sony. MiniDisc was the first commercial product to incorporate ATRAC, in 1992. ATRAC allowed a relatively small disc like MiniDisc t ...
,
Cook
Cook or The Cook may refer to:
Food preparation
* Cooking, the preparation of food
* Cook (domestic worker), a household staff member who prepares food
* Cook (profession), an individual who prepares food for consumption in the food industry
* C ...
,
Advanced Audio Coding
Advanced Audio Coding (AAC) is an audio coding standard for lossy digital audio compression. It was developed by Dolby, AT&T, Fraunhofer and Sony, originally as part of the MPEG-2 specification but later improved under MPEG-4.ISO (2006ISO/ ...
(AAC),
High-Definition Coding (HDC),
LDAC,
Dolby AC-4
Dolby AC-4 is an audio compression technology developed by Dolby Laboratories. Dolby AC-4 bitstreams can contain audio channels and/or audio objects. Dolby AC-4 has been adopted by the DVB project and standardized by the ETSI.
History
Its develop ...
, and
MPEG-H 3D Audio, as well as
speech coding
Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic da ...
standards such as
AAC-LD
The MPEG-4 Low Delay Audio Coder (a.k.a. AAC Low Delay, or AAC-LD) is audio compression standard designed to combine the advantages of perceptual audio coding with the low delay necessary for two-way communication. It is closely derived from the M ...
(LD-MDCT),
G.722.1,
G.729.1,
CELT
The Celts ( , see Names of the Celts#Pronunciation, pronunciation for different usages) or Celtic peoples ( ) were a collection of Indo-European languages, Indo-European peoples. "The Celts, an ancient Indo-European people, reached the apoge ...
,
[Presentation of the CELT codec](_blank)
by Timothy B. Terriberry (65 minutes of video, see als
presentation slides
in PDF) and
Opus.
The
discrete cosine transform
A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequency, frequencies. The DCT, first proposed by Nasir Ahmed (engineer), Nasir Ahmed in 1972, is a widely ...
(DCT) was first proposed by
Nasir Ahmed in 1972,
and demonstrated by Ahmed with T. Natarajan and
K. R. Rao in 1974.
The MDCT was later proposed by John P. Princen, A.W. Johnson and Alan B. Bradley at the
University of Surrey
The University of Surrey is a public research university in Guildford, Surrey, England. The university received its Royal Charter, royal charter in 1966, along with a Plate glass university, number of other institutions following recommendations ...
in 1987, following earlier work by Princen and Bradley (1986) to develop the MDCT's underlying principle of time-domain aliasing cancellation (TDAC), described below. (There also exists an analogous transform, the MDST, based on the
discrete sine transform, as well as other, rarely used, forms of the MDCT based on different types of DCT or DCT/DST combinations.)
In MP3, the MDCT is not applied to the audio signal directly, but rather to the output of a 32-band
polyphase quadrature filter Polyphase may refer to:
* Polyphase matrix, in signal processing
* Polyphase system, in electrical engineering
* Polyphase quadrature filter
* Polyphasic sleep
{{Disambig ...
(PQF) bank. The output of this MDCT is postprocessed by an alias reduction formula to reduce the typical aliasing of the PQF filter bank. Such a combination of a filter bank with an MDCT is called a ''hybrid'' filter bank or a ''subband'' MDCT. AAC, on the other hand, normally uses a pure MDCT; only the (rarely used)
MPEG-4 AAC-SSR variant (by
Sony
is a Japanese multinational conglomerate (company), conglomerate headquartered at Sony City in Minato, Tokyo, Japan. The Sony Group encompasses various businesses, including Sony Corporation (electronics), Sony Semiconductor Solutions (i ...
) uses a four-band PQF bank followed by an MDCT. Similar to MP3,
ATRAC
Adaptive Transform Acoustic Coding (ATRAC) is a family of proprietary audio compression algorithms developed by Sony. MiniDisc was the first commercial product to incorporate ATRAC, in 1992. ATRAC allowed a relatively small disc like MiniDisc t ...
uses stacked
quadrature mirror filters (QMF) followed by an MDCT.
Definition
As a lapped transform, the MDCT is somewhat unusual compared to other Fourier-related transforms in that it has half as many outputs as inputs (instead of the same number). In particular, it is a
linear function
In mathematics, the term linear function refers to two distinct but related notions:
* In calculus and related areas, a linear function is a function whose graph is a straight line, that is, a polynomial function of degree zero or one. For di ...
(where R denotes the set of
real number
In mathematics, a real number is a number that can be used to measure a continuous one- dimensional quantity such as a duration or temperature. Here, ''continuous'' means that pairs of values can have arbitrarily small differences. Every re ...
s). The 2''N'' real numbers ''x''
0, ..., ''x''
2''N''−1 are transformed into the ''N'' real numbers ''X''
0, ..., ''X''
''N''−1 according to the formula
:
The normalization coefficient in front of this transform, here unity, is an arbitrary convention and differs between treatments. Only the product of the normalizations of the MDCT and the IMDCT, below, is constrained.
Inverse transform
The inverse MDCT is known as the IMDCT. Because there are different numbers of inputs and outputs, at first glance it might seem that the MDCT should not be invertible. However, perfect invertibility is achieved by ''adding'' the overlapped IMDCTs of subsequent overlapping blocks, causing the errors to ''cancel'' and the original data to be retrieved; this technique is known as ''time-domain aliasing cancellation'' (TDAC).
The IMDCT transforms ''N'' real numbers ''X''
0, ..., ''X''
''N''−1 into 2''N'' real numbers ''y''
0, ..., ''y''
2''N''−1 according to the formula
:
Like for the
DCT-IV, an orthogonal transform, the inverse has the same form as the forward transform.
In the case of a windowed MDCT with the usual window normalization (see below), the normalization coefficient in front of the IMDCT should be multiplied by 2 (i.e., becoming 2/''N'').
Computation
Although the direct application of the MDCT formula would require O(''N''
2) operations, it is possible to compute the same thing with only O(''N'' log ''N'') complexity by recursively factorizing the computation, as in the
fast Fourier transform
A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). A Fourier transform converts a signal from its original domain (often time or space) to a representation in ...
(FFT). One can also compute MDCTs via other transforms, typically a DFT (FFT) or a DCT, combined with O(''N'') pre- and post-processing steps. Also, as described below, any algorithm for the DCT-IV immediately provides a method to compute the MDCT and IMDCT of even size.
Window functions
upright=1.8, MDCT window functions: blue cosine, red sine–cosine, green modified Kaiser–Bessel
In typical signal-compression applications, the transform properties are further improved by using a
window function
In signal processing and statistics, a window function (also known as an apodization function or tapering function) is a mathematical function that is zero-valued outside of some chosen interval. Typically, window functions are symmetric around ...
''w''
''n'' (''n'' = 0, ..., 2''N'' − 1) that is multiplied with ''x''
''n'' in the MDCT and with ''y''
''n'' in the IMDCT formulas above, in order to avoid discontinuities at the ''n'' = 0 and 2''N'' boundaries by making the function go smoothly to zero at those points. (That is, the window function is applied to the data ''before'' the MDCT or ''after'' the IMDCT.) In principle, ''x'' and ''y'' could have different window functions, and the window function could also change from one block to the next (especially for the case where data blocks of different sizes are combined), but for simplicity we consider the common case of identical window functions for equal-sized blocks.
The transform remains invertible (that is, TDAC works), for a symmetric window ''w''
''n'' = ''w''
2''N''−1−''n'', as long as ''w'' satisfies the Princen–Bradley condition:
:
Various window functions are used. A window that produces a form known as a modulated lapped transform (MLT)
[H. S. Malvar, "Modulated QMF Filter Banks with Perfect Reconstruction", ''Electronics Letters'', vol. 26, no. 13, pp. 906–907 (Equation 13), June 1990.] is given by
: