MP3 (formally MPEG-1 Audio Layer III or MPEG-2 Audio Layer III)
is a
coding format for
digital audio
Digital audio is a representation of sound recorded in, or converted into, digital signal (signal processing), digital form. In digital audio, the sound wave of the audio signal is typically encoded as numerical sampling (signal processing), ...
developed largely by the
Fraunhofer Society
The Fraunhofer Society () is a German publicly-owned research organization with 76institutes spread throughout Germany, each focusing on different fields of applied science (as opposed to the Max Planck Society, which works primarily on Basic re ...
in Germany under the lead of
Karlheinz Brandenburg
Karlheinz Brandenburg (born 20 June 1954) is a German electrical engineer and mathematician. Together with Ernst Eberlein, Heinz Gerhäuser (former Institutes Director of Fraunhofer IIS), Bernhard Grill, Jürgen Herre and Harald Popp (all Fraunh ...
. It was designed to greatly reduce the amount of data required to represent audio, yet still sound like a faithful reproduction of the original
uncompressed
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...
audio to most listeners; for example, compared to
CD-quality digital audio, MP3 compression can commonly achieve a 75–95% reduction in size, depending on the
bit rate
In telecommunications and computing, bit rate (bitrate or as a variable ''R'') is the number of bits that are conveyed or processed per unit of time.
The bit rate is expressed in the unit bit per second (symbol: bit/s), often in conjunction ...
. In popular usage, ''MP3'' often refers to
files of sound or music recordings stored in the MP3
file format
A file format is a Computer standard, standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary format, pr ...
(
.mp3
) on consumer electronic devices.
Originally defined in 1991 as one of the three audio codecs of the
MPEG-1
MPEG-1 is a Technical standard, standard for lossy compression of video and Audio frequency, audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s (26:1 and 6:1 compression ratios respectively ...
standard (along with
MP2 and
MP1), it was retained and further extended—defining additional bit rates and support for more
audio channels—as the third audio format of the subsequent
MPEG-2
MPEG-2 (a.k.a. H.222/H.262 as was defined by the ITU) is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods ...
standard. MP3 as a
file format
A file format is a Computer standard, standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary format, pr ...
commonly designates files containing an
elementary stream An elementary stream (ES) as defined by the MPEG communication protocol is usually the output of an audio encoder or video encoder. An ES contains only one kind of data (e.g. audio, video, or closed caption). An elementary stream is often referred ...
of MPEG-1 Audio or MPEG-2 Audio encoded data, without other complexities of the MP3 standard. Concerning
audio compression, which is its most apparent element to end-users, MP3 uses
lossy compression
In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...
to encode data using inexact approximations and the partial discarding of data, allowing for a large reduction in
file size
File size is a measure of how much data a computer file contains or how much storage space it is allocated. Typically, file size is expressed in units based on byte. A large value is often expressed with a metric prefix (as in megabyte and giga ...
s when compared to uncompressed audio.
The combination of small size and acceptable fidelity led to a boom in the distribution of music over the
Internet
The Internet (or internet) is the Global network, global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a internetworking, network of networks ...
in the late 1990s, with MP3 serving as an enabling technology at a time when
bandwidth
Bandwidth commonly refers to:
* Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range
* Bandwidth (computing), the rate of data transfer, bit rate or thr ...
and storage were still at a premium. The MP3 format soon became associated with controversies surrounding
copyright infringement
Copyright infringement (at times referred to as piracy) is the use of Copyright#Scope, works protected by copyright without permission for a usage where such permission is required, thereby infringing certain exclusive rights granted to the c ...
,
music piracy
Music piracy is the copying and distributing of recordings of a piece of music for which the rights owners (composer, recording artist, or copyright-holding record company) did not give consent. In the contemporary legal environment, it is a form ...
, and the file-
ripping
Ripping is the extraction of digital content from a container, such as a CD, onto a new digital location. Originally, the term meant to rip music from Commodore 64 games. Later, the term was applied to ripping WAV or MP3 files from digital audio ...
and
sharing
Sharing is the joint use of a resource or space. It is also the process of dividing and distributing. In its narrow sense, it refers to joint or alternating use of inherently finite goods, such as a common pasture or a shared residence. Still ...
services
MP3.com and
Napster
Napster was an American proprietary peer-to-peer (P2P) file sharing application primarily associated with digital audio file distribution. Founded by Shawn Fanning and Sean Parker, the platform originally launched on June 1, 1999. Audio shared ...
, among others. With the advent of
portable media player
A portable media player (PMP) or digital audio player (DAP) is a portable consumer electronics device capable of storing and playing digital media such as audio, images, and video files. Normally they refer to small, Electric battery, batter ...
s (including "MP3 players"), a product category also including
smartphones
A smartphone is a mobile phone with advanced computing capabilities. It typically has a touchscreen interface, allowing users to access a wide range of applications and services, such as web browsing, email, and social media, as well as mult ...
, MP3 support became near-universal and it remains a
''de facto'' standard for digital audio despite the creation of newer coding formats such as
AAC.
History
The
Moving Picture Experts Group
The Moving Picture Experts Group (MPEG) is an alliance of working groups established jointly by ISO and IEC that sets standards for media coding, including compression coding of audio, video, graphics, and genomic data; and transmission and ...
(MPEG) designed MP3 as part of its
MPEG-1
MPEG-1 is a Technical standard, standard for lossy compression of video and Audio frequency, audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s (26:1 and 6:1 compression ratios respectively ...
, and later
MPEG-2
MPEG-2 (a.k.a. H.222/H.262 as was defined by the ITU) is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods ...
, standards. MPEG-1 Audio (MPEG-1 Part 3), which included MPEG-1 Audio Layer I, II, and III, was approved as a committee draft for an
ISO
The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries.
Me ...
/
IEC
The International Electrotechnical Commission (IEC; ) is an international standards organization that prepares and publishes international standards for all electrical, electronic and related technologies. IEC standards cover a vast range of ...
standard in 1991,
finalized in 1992,
and published in 1993 as ISO/IEC 11172-3:1993.
An MPEG-2 Audio (MPEG-2 Part 3) extension with lower sample and bit rates was published in 1995 as ISO/IEC 13818-3:1995.
It requires only minimal modifications to existing MPEG-1 decoders (recognition of the MPEG-2 bit in the header and addition of the new lower sample and bit rates).
Background
The MP3
lossy compression
In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...
algorithm takes advantage of a perceptual limitation of human hearing called
auditory masking
In audio signal processing, auditory masking occurs when the perception of one sound is affected by the presence of another sound.Gelfand, S.A. (2004) ''Hearing – An Introduction to Psychological and Physiological Acoustics'' 4th Ed. New York, ...
. In 1894, the American physicist
Alfred M. Mayer reported that a tone could be rendered inaudible by another tone of lower frequency.
In 1959, Richard Ehmer described a complete set of auditory curves regarding this phenomenon.
Between 1967 and 1974,
Eberhard Zwicker did work in the areas of tuning and masking of critical frequency-bands,
which in turn built on the fundamental research in the area from
Harvey Fletcher
Harvey Fletcher (September 11, 1884 – July 23, 1981) was an American physicist. Known as the "father of stereophonic sound", he is credited with the invention of the 2-A audiometer and an early electronic hearing aid. He was an investigator in ...
and his collaborators at
Bell Labs
Nokia Bell Labs, commonly referred to as ''Bell Labs'', is an American industrial research and development company owned by Finnish technology company Nokia. With headquarters located in Murray Hill, New Jersey, Murray Hill, New Jersey, the compa ...
.
Perceptual coding was first used for
speech coding
Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic da ...
compression with
linear predictive coding
Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model ...
(LPC),
which has origins in the work of
Fumitada Itakura is a Japanese scientist. He did pioneering work in statistical signal processing, and its application to speech analysis, synthesis and coding, including the development of the linear predictive coding (LPC) and line spectral pairs (LSP) metho ...
(
Nagoya University
, abbreviated to or NU, is a Japanese national research university located in Chikusa-ku, Nagoya.
It was established in 1939 as the last of the nine Imperial Universities in the then Empire of Japan, and is now a Designated National Universit ...
) and Shuzo Saito (
Nippon Telegraph and Telephone
(NTT) is a Japanese telecommunications holding company headquartered in Tokyo, Japan. Ranked 55th in ''Fortune'' Global 500, NTT is the fourth largest telecommunications company in the world in terms of revenue, as well as the third largest pu ...
) in 1966. In 1978,
Bishnu S. Atal and
Manfred R. Schroeder at Bell Labs proposed an LPC speech
codec
A codec is a computer hardware or software component that encodes or decodes a data stream or signal. ''Codec'' is a portmanteau of coder/decoder.
In electronic communications, an endec is a device that acts as both an encoder and a decoder o ...
, called
adaptive predictive coding, that used a
psychoacoustic
Psychoacoustics is the branch of psychophysics involving the scientific study of the perception of sound by the human auditory system. It is the branch of science studying the psychological responses associated with sound including noise, speech, ...
coding-algorithm exploiting the masking properties of the human ear.
Further optimization by Schroeder and Atal with J.L. Hall was later reported in a 1979 paper.
That same year, a psychoacoustic masking codec was also proposed by M. A. Krasner,
who published and produced hardware for speech (not usable as music bit-compression), but the publication of his results in a relatively obscure
Lincoln Laboratory
The MIT Lincoln Laboratory, located in Lexington, Massachusetts, is a United States Department of Defense federally funded research and development center chartered to apply advanced technology to problems of national security. Research and dev ...
Technical Report did not immediately influence the mainstream of psychoacoustic codec-development.
The
discrete cosine transform
A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequency, frequencies. The DCT, first proposed by Nasir Ahmed (engineer), Nasir Ahmed in 1972, is a widely ...
(DCT), a type of
transform coding
Transform coding is a type of data compression for "natural" data like audio signals or photographic images. The transformation is typically lossless (perfectly reversible) on its own but is used to enable better (more targeted) quantization, whi ...
for lossy compression, proposed by
Nasir Ahmed in 1972, was developed by Ahmed with T. Natarajan and
K. R. Rao in 1973; they published their results in 1974. This led to the development of the
modified discrete cosine transform
The modified discrete cosine transform (MDCT) is a transform based on the type-IV discrete cosine transform (DCT-IV), with the additional property of being lapped: it is designed to be performed on consecutive blocks of a larger dataset, where s ...
(MDCT), proposed by J. P. Princen, A. W. Johnson and A. B. Bradley in 1987, following earlier work by Princen and Bradley in 1986. The MDCT later became a core part of the MP3 algorithm.
Ernst Terhardt and other collaborators constructed an algorithm describing auditory masking with high accuracy in 1982.
This work added to a variety of reports from authors dating back to Fletcher, and to the work that initially determined critical ratios and critical bandwidths.
In 1985, Atal and Schroeder presented
code-excited linear prediction (CELP), an LPC-based perceptual speech-coding algorithm with auditory masking that achieved a significant
data compression ratio
Data compression ratio, also known as compression power, is a measurement of the relative reduction in size of data representation produced by a data compression algorithm. It is typically expressed as the division of uncompressed size by compress ...
for its time.
IEEE
The Institute of Electrical and Electronics Engineers (IEEE) is an American 501(c)(3) organization, 501(c)(3) public charity professional organization for electrical engineering, electronics engineering, and other related disciplines.
The IEEE ...
's refereed ''Journal on Selected Areas in Communications'' reported on a wide variety of (mostly perceptual) audio compression algorithms in 1988.
The "Voice Coding for Communications" edition published in February 1988 reported on a wide range of established, working audio bit compression technologies,
some of them using auditory masking as part of their fundamental design, and several showing real-time hardware implementations.
Development
The genesis of the MP3 technology is fully described in a paper from Professor Hans Musmann,
[Genesis of the MP3 Audio Coding Standard in IEEE Transactions on Consumer Electronics, IEEE, Vol. 52, Nr. 3, pp. 1043–1049, August 2006] who chaired the ISO MPEG Audio group for several years. In December 1988, MPEG called for an audio coding standard. In June 1989, 14 audio coding algorithms were submitted. Because of certain similarities between these coding proposals, they were clustered into four development groups. The first group was ASPEC, by
Fraunhofer Gesellschaft,
AT&T
AT&T Inc., an abbreviation for its predecessor's former name, the American Telephone and Telegraph Company, is an American multinational telecommunications holding company headquartered at Whitacre Tower in Downtown Dallas, Texas. It is the w ...
,
France Telecom, Deutsche and
Thomson-Brandt. The second group was
MUSICAM, by
Matsushita,
CCETT, ITT and
Philips
Koninklijke Philips N.V. (), simply branded Philips, is a Dutch multinational health technology company that was founded in Eindhoven in 1891. Since 1997, its world headquarters have been situated in Amsterdam, though the Benelux headquarter ...
. The third group was ATAC (ATRAC Coding), by
Fujitsu,
JVC,
NEC
is a Japanese multinational information technology and electronics corporation, headquartered at the NEC Supertower in Minato, Tokyo, Japan. It provides IT and network solutions, including cloud computing, artificial intelligence (AI), Inte ...
and
Sony
is a Japanese multinational conglomerate (company), conglomerate headquartered at Sony City in Minato, Tokyo, Japan. The Sony Group encompasses various businesses, including Sony Corporation (electronics), Sony Semiconductor Solutions (i ...
. And the fourth group was
SB-ADPCM, by
NTT and BTRL.
The immediate predecessors of MP3 were "Optimum Coding in the Frequency Domain" (OCF),
and Perceptual Transform Coding (PXFM).
These two codecs, along with block-switching contributions from Thomson-Brandt, were merged into a codec called ASPEC, which was submitted to MPEG, and which won the quality competition, but that was mistakenly rejected as too complex to implement. The first practical implementation of an audio perceptual coder (OCF) in hardware (Krasner's hardware was too cumbersome and slow for practical use), was an implementation of a psychoacoustic transform coder based on
Motorola 56000 DSP chips.
Another predecessor of the MP3 format and technology is to be found in the perceptual codec MUSICAM based on an integer arithmetics 32 sub-bands filter bank, driven by a psychoacoustic model. It was primarily designed for Digital Audio Broadcasting (digital radio) and digital TV, and its basic principles were disclosed to the scientific community by CCETT (France) and IRT (Germany) in Atlanta during an IEEE-
ICASSP conference in 1991, after having worked on MUSICAM with Matsushita and Philips since 1989.
This codec incorporated into a broadcasting system using COFDM modulation was demonstrated on air and in the field with Radio Canada and CRC Canada during the NAB show (Las Vegas) in 1991. The implementation of the audio part of this broadcasting system was based on a two-chip encoder (one for the subband transform, one for the psychoacoustic model designed by the team of
G. Stoll (IRT Germany), later known as psychoacoustic model I) and a real-time decoder using one
Motorola 56001 DSP chip running an integer arithmetics software designed by Y.F. Dehery's team (CCETT, France). The simplicity of the corresponding decoder together with the high audio quality of this codec using for the first time a 48 kHz
sampling rate
In signal processing, sampling is the reduction of a continuous-time signal to a discrete-time signal. A common example is the conversion of a sound wave to a sequence of "samples".
A sample is a value of the signal at a point in time and/or s ...
, a 20 bits/sample input format (the highest available sampling standard in 1991, compatible with the AES/EBU professional digital input studio standard) were the main reasons to later adopt the characteristics of MUSICAM as the basic features for an advanced digital music compression codec.
During the development of the MUSICAM encoding software, Stoll and Dehery's team made thorough use of a set of high-quality audio assessment material selected by a group of audio professionals from the European Broadcasting Union, and later used as a reference for the assessment of music compression codecs. The subband coding technique was found to be efficient, not only for the perceptual coding of high-quality sound materials but especially for the encoding of critical percussive sound materials (drums,
triangle
A triangle is a polygon with three corners and three sides, one of the basic shapes in geometry. The corners, also called ''vertices'', are zero-dimensional points while the sides connecting them, also called ''edges'', are one-dimension ...
,...), due to the specific temporal masking effect of the MUSICAM sub-band filterbank (this advantage being a specific feature of short transform coding techniques).
As a doctoral student at Germany's
University of Erlangen-Nuremberg
A university () is an institution of tertiary education and research which awards academic degrees in several academic disciplines. ''University'' is derived from the Latin phrase , which roughly means "community of teachers and scholars". Univ ...
,
Karlheinz Brandenburg
Karlheinz Brandenburg (born 20 June 1954) is a German electrical engineer and mathematician. Together with Ernst Eberlein, Heinz Gerhäuser (former Institutes Director of Fraunhofer IIS), Bernhard Grill, Jürgen Herre and Harald Popp (all Fraunh ...
began working on digital music compression in the early 1980s, focusing on how people perceive music. He completed his doctoral work in 1989.
MP3 is directly descended from OCF and PXFM, representing the outcome of the collaboration of Brandenburg — working as a postdoctoral researcher at AT&T-Bell Labs with James D. Johnston ("JJ") of AT&T-Bell Labs — with the
Fraunhofer Institute for Integrated Circuits, Erlangen (where he worked with
Bernhard Grill and four other researchers – "The Original Six"), with relatively minor contributions from the MP2 branch of psychoacoustic sub-band coders. In 1990, Brandenburg became an assistant professor at Erlangen-Nuremberg. While there, he continued to work on music compression with scientists at the
Fraunhofer Society
The Fraunhofer Society () is a German publicly-owned research organization with 76institutes spread throughout Germany, each focusing on different fields of applied science (as opposed to the Max Planck Society, which works primarily on Basic re ...
's
Heinrich Herz Institute. In 1993, he joined the staff of Fraunhofer HHI.
An acapella version of the song "
Tom's Diner" by
Suzanne Vega
Suzanne Nadine Vega ( Peck; born July 11, 1959) is an American singer-songwriter of Folk music, folk-inspired music. Vega's music career spans 40 years. In the mid-1980s and 1990s she released four singles that entered the Top 40 charts in the ...
was the first song used by Brandenburg to develop the MP3 format. It was used as a benchmark to see how well MP3's compression algorithm handled the human voice. Brandenburg adopted the song for testing purposes, listening to it again and again each time he refined the compression algorithm, making sure it did not adversely affect the reproduction of Vega's voice.
Accordingly, he dubbed Vega the "Mother of MP3".
Instrumental music had been easier to compress, but Vega's voice sounded unnatural in early versions of the format. Brandenburg eventually met Vega and heard Tom's Diner performed live.
Standardization
In 1991, two available proposals were assessed for an MPEG audio standard:
MUSICAM (
Masking pattern adapted
Universal
Subband
Integrated
Coding
And
Multiplexing) and ASPEC (
Adaptive
Spectral
Perceptual
Entropy
Coding). The MUSICAM technique, proposed by
Philips
Koninklijke Philips N.V. (), simply branded Philips, is a Dutch multinational health technology company that was founded in Eindhoven in 1891. Since 1997, its world headquarters have been situated in Amsterdam, though the Benelux headquarter ...
(Netherlands),
CCETT (France), the
Institute for Broadcast Technology (Germany), and Matsushita (Japan), was chosen due to its simplicity and error robustness, as well as for its high level of computational efficiency.
The MUSICAM format, based on
sub-band coding, became the basis for the MPEG Audio compression format, incorporating, for example, its frame structure, header format, sample rates, etc.
While much of MUSICAM technology and ideas were incorporated into the definition of MPEG Audio Layer I and Layer II, the filter bank alone and the data structure based on 1152 samples framing (file format and byte-oriented stream) of MUSICAM remained in the Layer III (MP3) format, as part of the computationally inefficient hybrid
filter bank. Under the chairmanship of Professor Musmann of the
Leibniz University Hannover
Leibniz University Hannover (), also known as the University of Hannover, is a public university, public research university located in Hanover, Germany. Founded on 2 May 1831 as Higher Vocational School, the university has undergone six period ...
, the editing of the standard was delegated to Leon van de Kerkhof (Netherlands), Gerhard Stoll (Germany), and Yves-François Dehery (France), who worked on Layer I and Layer II. ASPEC was the joint proposal of AT&T Bell Laboratories, Thomson Consumer Electronics, Fraunhofer Society, and
CNET.
It provided the highest coding efficiency.
A
working group
A working group is a group of experts working together to achieve specified goals. Such groups are domain-specific and focus on discussion or activity around a specific subject area. The term can sometimes refer to an interdisciplinary collab ...
consisting of van de Kerkhof, Stoll,
Leonardo Chiariglione
Leonardo Chiariglione () (born 30 January 1943 (age ) in Almese, Turin province, Piedmont, Italy) is an Italian engineer who has led the development of international technical standards for digital media. In particular, he was the chairman of ...
(
CSELT VP for Media), Yves-François Dehery, Karlheinz Brandenburg (Germany) and James D. Johnston (United States) took ideas from ASPEC, integrated the filter bank from Layer II, added some of their ideas such as the joint stereo coding of MUSICAM and created the MP3 format, which was designed to achieve the same quality at as
MP2 at .
The algorithms for MPEG-1 Audio Layer I, II and III were approved in 1991
and finalized in 1992
as part of
MPEG-1
MPEG-1 is a Technical standard, standard for lossy compression of video and Audio frequency, audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s (26:1 and 6:1 compression ratios respectively ...
, the first standard suite by
MPEG
The Moving Picture Experts Group (MPEG) is an alliance of working groups established jointly by International Organization for Standardization, ISO and International Electrotechnical Commission, IEC that sets standards for media coding, includ ...
, which resulted in the international standard ISO/IEC 11172-3 (a.k.a. ''MPEG-1 Audio'' or ''MPEG-1 Part 3''), published in 1993.
Files or data streams conforming to this standard must handle sample rates of 48k, 44100, and 32k and continue to be supported by current
MP3 player
A portable media player (PMP) or digital audio player (DAP) is a portable consumer electronics device capable of storing and playing digital media such as audio, images, and video files. Normally they refer to small, battery-powered devices ...
s and decoders. Thus the first generation of MP3 defined interpretations of MP3 frame data structures and size layouts.
The compression efficiency of encoders is typically defined by the bit rate because the compression ratio depends on the
bit depth and
sampling rate
In signal processing, sampling is the reduction of a continuous-time signal to a discrete-time signal. A common example is the conversion of a sound wave to a sequence of "samples".
A sample is a value of the signal at a point in time and/or s ...
of the input signal. Nevertheless, compression ratios are often published. They may use the
compact disc
The compact disc (CD) is a Digital media, digital optical disc data storage format co-developed by Philips and Sony to store and play digital audio recordings. It employs the Compact Disc Digital Audio (CD-DA) standard and was capable of hol ...
(CD) parameters as references (44.1
kHz
The hertz (symbol: Hz) is the unit of frequency in the International System of Units (SI), often described as being equivalent to one event (or cycle) per second. The hertz is an SI derived unit whose formal expression in terms of SI base uni ...
, 2 channels at 16 bits per channel or 2×16 bit), or sometimes the
Digital Audio Tape
Digital Audio Tape (DAT or R-DAT) is a signal recording and playback medium developed by Sony and introduced in 1987. In appearance it is similar to a Compact Cassette, using 3.81 mm / 0.15" (commonly referred to as 4 mm) magnetic t ...
(DAT) SP parameters (48 kHz, 2×16 bit). Compression ratios with this latter reference are higher, which demonstrates the problem with the use of the term ''compression ratio'' for lossy encoders.
Karlheinz Brandenburg used a CD recording of
Suzanne Vega
Suzanne Nadine Vega ( Peck; born July 11, 1959) is an American singer-songwriter of Folk music, folk-inspired music. Vega's music career spans 40 years. In the mid-1980s and 1990s she released four singles that entered the Top 40 charts in the ...
's song "
Tom's Diner" to assess and refine the MP3
compression algorithm
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...
. This song was chosen because of its nearly
monophonic nature and wide spectral content, making it easier to hear imperfections in the compression format during playbacks. This particular track has an interesting property in that the two channels are almost, but not completely, the same, leading to a case where Binaural Masking Level Depression causes spatial unmasking of noise artifacts unless the encoder properly recognizes the situation and applies corrections similar to those detailed in the MPEG-2 AAC psychoacoustic model. Some more critical audio excerpts (
glockenspiel
The glockenspiel ( ; or , : bells and : play) or bells is a percussion instrument consisting of pitched aluminum or steel bars arranged in a Musical keyboard, keyboard layout. This makes the glockenspiel a type of metallophone, similar to the v ...
, triangle,
accordion
Accordions (from 19th-century German language, German ', from '—"musical chord, concord of sounds") are a family of box-shaped musical instruments of the bellows-driven free reed aerophone type (producing sound as air flows past a Reed (mou ...
, etc.) were taken from the
EBU V3/SQAM reference compact disc and have been used by professional sound engineers to assess the subjective quality of the MPEG Audio formats.
Going public
A reference simulation software implementation, written in the C language and later known as ''ISO 11172-5'', was developed (in 1991–1996) by the members of the ISO MPEG Audio committee to produce bit-compliant MPEG Audio files (Layer 1, Layer 2, Layer 3). It was approved as a committee draft of the ISO/IEC technical report in March 1994 and printed as document CD 11172-5 in April 1994.
It was approved as a draft technical report (DTR/DIS) in November 1994,
finalized in 1996 and published as international standard ISO/IEC TR 11172-5:1998 in 1998.
The
reference software in C language was later published as a freely available ISO standard.
Working in non-real time on several operating systems, it was able to demonstrate the first real-time hardware decoding (DSP based) of compressed audio. Some other real-time implementations of MPEG Audio encoders and decoders were available for digital broadcasting (radio
DAB, television
DVB) towards consumer receivers and set-top boxes.
On 7 July 1994, the Fraunhofer Society released the first software MP3 encoder, called
l3enc.
The
filename extension
A filename extension, file name extension or file extension is a suffix to the name of a computer file (for example, .txt, .mp3, .exe) that indicates a characteristic of the file contents or its intended use. A filename extension is typically d ...
''.mp3'' was chosen by the Fraunhofer team on 14 July 1995 (previously, the files had been named ''.bit'').
With the first real-time software MP3 player
WinPlay3 (released 9 September 1995) many people were able to encode and play back MP3 files on their PCs. Because of the relatively small
hard drive
A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating hard disk drive platter, pla ...
s of the era (≈500–1000
MB) lossy compression was essential to store multiple albums' worth of music on a home computer as full recordings (as opposed to
MIDI
Musical Instrument Digital Interface (; MIDI) is an American-Japanese technical standard that describes a communication protocol, digital interface, and electrical connectors that connect a wide variety of electronic musical instruments, ...
notation, or
tracker files which combined notation with short recordings of instruments playing single notes).
Fraunhofer example implementation
A hacker named SoloH discovered the
source code
In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer.
Since a computer, at base, only ...
of the "dist10" MPEG
reference implementation
In the software development process, a reference implementation (or, less frequently, sample implementation or model implementation) is a program that implements all requirements from a corresponding specification. The reference implementation ...
shortly after the release on the servers of the
University of Erlangen
A university () is an institution of tertiary education and research which awards academic degrees in several academic disciplines. ''University'' is derived from the Latin phrase , which roughly means "community of teachers and scholars". Univ ...
. He developed a higher-quality version and spread it on the internet. This code started the widespread
CD ripping and digital music distribution as MP3 over the internet.
Further versions
Further work on MPEG audio
was finalized in 1994 as part of the second suite of MPEG standards,
MPEG-2
MPEG-2 (a.k.a. H.222/H.262 as was defined by the ITU) is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods ...
, more formally known as international standard ISO/IEC 13818-3 (a.k.a. ''MPEG-2 Part 3'' or backward compatible ''MPEG-2 Audio'' or ''MPEG-2 Audio BC''
), originally published in 1995.
MPEG-2 Part 3 (ISO/IEC 13818-3) defined 42 additional bit rates and sample rates for MPEG-1 Audio Layer I, II and III. The new sampling rates are exactly half that of those originally defined in MPEG-1 Audio. This reduction in sampling rates serves to cut the available frequency fidelity in half while likewise cutting the bit rate by 50%. MPEG-2 Part 3 also enhanced MPEG-1's audio by allowing the coding of audio programs with more than two channels, up to 5.1 multichannel.
An MP3 coded with MPEG-2 results in half of the bandwidth reproduction of MPEG-1 appropriate for piano and singing.
A third generation of "MP3" style data streams (files) extended the ''MPEG-2'' ideas and implementation but was named ''MPEG-2.5'' audio since MPEG-3 already had a different meaning. This extension was developed at Fraunhofer IIS, the registered patent holder of MP3, by reducing the frame sync field in the MP3 header from 12 to 11 bits. As in the transition from MPEG-1 to MPEG-2, MPEG-2.5 adds additional sampling rates exactly half of those available using MPEG-2. It thus widens the scope of MP3 to include human speech and other applications yet requires only 25% of the bandwidth (frequency reproduction) possible using MPEG-1 sampling rates. While not an ISO-recognized standard, MPEG-2.5 is widely supported by both inexpensive Chinese and brand-name digital audio players as well as computer software-based MP3 encoders (
LAME
LAME is a software encoder that converts digital audio into the MP3 audio coding format. LAME is a free software project that was first released in 1998 and has incorporated many improvements since then, including an improved psychoacoustic ...
), decoders (FFmpeg) and players (MPC) adding additional MP3 frame types. Each generation of MP3 thus supports 3 sampling rates exactly half that of the previous generation for a total of 9 varieties of MP3 format files. The sample rate comparison table between MPEG-1, 2, and 2.5 is given later in the article.
MPEG-2.5 is supported by LAME (since 2000), Media Player Classic (MPC), iTunes, and FFmpeg.
MPEG-2.5 was not developed by MPEG (see above) and was never approved as an international standard. MPEG-2.5 is thus an unofficial or proprietary extension to the MP3 format. It is nonetheless ubiquitous and especially advantageous for low-bit-rate human speech applications.
The ISO standard ISO/IEC 11172-3 (a.k.a. MPEG-1 Audio) defined three formats: the MPEG-1 Audio Layer I, Layer II and Layer III. The ISO standard ISO/IEC 13818-3 (a.k.a. MPEG-2 Audio) defined an extended version of MPEG-1 Audio: MPEG-2 Audio Layer I, Layer II, and Layer III. MPEG-2 Audio (MPEG-2 Part 3) should not be confused with MPEG-2 AAC (MPEG-2 Part 7 – ISO/IEC 13818-7).
LAME is the most advanced MP3 encoder. LAME includes a
variable bit rate
Variable bitrate (VBR) is a term used in telecommunications and computing that relates to the bitrate used in sound or video encoding. As opposed to constant bitrate (CBR), VBR files vary the amount of output data per time segment. VBR allows ...
(VBR) encoding which uses a quality parameter rather than a bit rate goal. Later versions (2008+) support an ''n.nnn'' quality goal which automatically selects MPEG-2 or MPEG-2.5 sampling rates as appropriate for human speech recordings that need only 5512 Hz bandwidth resolution.
Internet distribution
In the second half of the 1990s, MP3 files began to spread on the
Internet
The Internet (or internet) is the Global network, global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a internetworking, network of networks ...
, often via underground pirated song networks. The first known experiment in Internet distribution was organized in the early 1990s by the
Internet Underground Music Archive, better known by the acronym IUMA. After some experiments using uncompressed audio files, this archive started to deliver on the native worldwide low-speed Internet some compressed MPEG Audio files using the MP2 (Layer II) format and later on used MP3 files when the standard was fully completed. The popularity of MP3s began to rise rapidly with the advent of
Nullsoft
Nullsoft, Inc. was an American software house founded in Sedona, Arizona in 1997 by programmer Justin Frankel. Its products included the Winamp media player and the SHOUTcast MP3 streaming media server.
History
In 1997, Justin Frankel, ...
's audio player
Winamp
Winamp is a media player (software), media player for Microsoft Windows originally developed by Justin Frankel and Dmitry Boldyrev by their company Nullsoft, which they later sold to AOL in 1999 for $80 million. It was then acquired by Rad ...
, released in 1997, which still had in 2023 a community of 80 million active users. In 1998,
Windows Media Player
Windows Media Player (WMP, officially referred to as Windows Media Player Legacy to retronym, distinguish it from Windows Media Player (2022), the new Windows Media Player introduced with Windows 11) is the first media player (application soft ...
5.2 and later added support for MP3 format. In 1998, the first portable solid-state digital audio player
MPMan, developed by SaeHan Information Systems, which is headquartered in
Seoul
Seoul, officially Seoul Special Metropolitan City, is the capital city, capital and largest city of South Korea. The broader Seoul Metropolitan Area, encompassing Seoul, Gyeonggi Province and Incheon, emerged as the world's List of cities b ...
,
South Korea
South Korea, officially the Republic of Korea (ROK), is a country in East Asia. It constitutes the southern half of the Korea, Korean Peninsula and borders North Korea along the Korean Demilitarized Zone, with the Yellow Sea to the west and t ...
, was released and the
Rio PMP300
The Rio PMP300 is one of the first portable consumer MP3 digital audio players, and the first commercially successful one. Produced by Diamond Multimedia, it was introduced September 15, 1998 as the first in the "Rio" series of digital audio ...
was sold afterward in 1998, despite legal suppression efforts by the
RIAA
The Recording Industry Association of America (RIAA) is a trade organization that represents the music recording industry in the United States. Its members consist of record labels and distributors that the RIAA says "create, manufacture, and/o ...
.
In November 1997, the website
mp3.com was offering thousands of MP3s created by independent artists for free.
The small size of MP3 files enabled widespread peer-to-peer
file sharing
File sharing is the practice of distributing or providing access to digital media, such as computer programs, multimedia (audio, images and video), documents or electronic books. Common methods of storage, transmission and dispersion include ...
of music
ripped from CDs, which would have previously been nearly impossible. The first large
peer-to-peer
Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network, forming a peer-to-peer network of Node ...
filesharing network,
Napster
Napster was an American proprietary peer-to-peer (P2P) file sharing application primarily associated with digital audio file distribution. Founded by Shawn Fanning and Sean Parker, the platform originally launched on June 1, 1999. Audio shared ...
, was launched in 1999. The ease of creating and sharing MP3s resulted in widespread
copyright infringement
Copyright infringement (at times referred to as piracy) is the use of Copyright#Scope, works protected by copyright without permission for a usage where such permission is required, thereby infringing certain exclusive rights granted to the c ...
. Major record companies argued that this free sharing of music reduced sales, and called it "
music piracy
Music piracy is the copying and distributing of recordings of a piece of music for which the rights owners (composer, recording artist, or copyright-holding record company) did not give consent. In the contemporary legal environment, it is a form ...
". They reacted by pursuing lawsuits against
Napster
Napster was an American proprietary peer-to-peer (P2P) file sharing application primarily associated with digital audio file distribution. Founded by Shawn Fanning and Sean Parker, the platform originally launched on June 1, 1999. Audio shared ...
, which was eventually shut down and later sold, and against individual users who engaged in file sharing.
Unauthorized MP3 file sharing continues on next-generation
peer-to-peer networks
Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network, forming a peer-to-peer network of Node ...
. Some authorized services, such as
Beatport
Beatport is an American electronic music-oriented online music store owned by LiveStyle. The company is based in Denver, Los Angeles, and Berlin. Beatport is oriented primarily towards disk jockey, DJs, selling full songs as well as resources that ...
,
Bleep,
Juno Records,
eMusic,
Zune Marketplace,
Walmart.com,
Rhapsody, the recording industry approved re-incarnation of
Napster
Napster was an American proprietary peer-to-peer (P2P) file sharing application primarily associated with digital audio file distribution. Founded by Shawn Fanning and Sean Parker, the platform originally launched on June 1, 1999. Audio shared ...
, and
Amazon.com
Amazon.com, Inc., doing business as Amazon, is an American multinational technology company engaged in e-commerce, cloud computing, online advertising, digital streaming, and artificial intelligence. Founded in 1994 by Jeff Bezos in Bellevu ...
sell unrestricted music in the MP3 format.
Design
File structure
An MP3 file is made up of MP3 frames, which consist of a header and a data block. This sequence of frames is called an
elementary stream An elementary stream (ES) as defined by the MPEG communication protocol is usually the output of an audio encoder or video encoder. An ES contains only one kind of data (e.g. audio, video, or closed caption). An elementary stream is often referred ...
. Due to the "bit reservoir", frames are not independent items and cannot usually be extracted on arbitrary frame boundaries. The MP3 Data blocks contain the (compressed) audio information in terms of frequencies and amplitudes. The diagram shows that the MP3 Header consists of a
sync word
In computer networks, a syncword, sync character, sync sequence or preamble is used to synchronize a data transmission by indicating the end of header information and the start of data. The syncword is a known sequence of data used to identif ...
, which is used to identify the beginning of a valid frame. This is followed by a bit indicating that this is the
MPEG
The Moving Picture Experts Group (MPEG) is an alliance of working groups established jointly by International Organization for Standardization, ISO and International Electrotechnical Commission, IEC that sets standards for media coding, includ ...
standard and two bits that indicate that layer 3 is used; hence MPEG-1 Audio Layer 3 or MP3. After this, the values will differ, depending on the MP3 file. ''ISO/IEC 11172-3'' defines the range of values for each section of the header along with the specification of the header. Most MP3 files today contain
ID3 metadata
Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive ...
, which precedes or follows the MP3 frames, as noted in the diagram. The data stream can contain an optional
checksum
A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify dat ...
.
Joint stereo is done only on a frame-to-frame basis.
Encoding and decoding
In short, MP3 compression works by reducing the accuracy of certain components of sound that are considered (by psychoacoustic analysis) to be beyond the
hearing capabilities of most humans. This method is commonly referred to as perceptual coding or
psychoacoustic
Psychoacoustics is the branch of psychophysics involving the scientific study of the perception of sound by the human auditory system. It is the branch of science studying the psychological responses associated with sound including noise, speech, ...
modeling.
The remaining audio information is then recorded in a space-efficient manner using
MDCT and
FFT algorithms.
The MP3 encoding algorithm is generally split into four parts. Part 1 divides the audio signal into smaller pieces, called frames, and an MDCT filter is then performed on the output. Part 2 passes the sample into a 1024-point
fast Fourier transform
A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). A Fourier transform converts a signal from its original domain (often time or space) to a representation in ...
(FFT), then the
psychoacoustic
Psychoacoustics is the branch of psychophysics involving the scientific study of the perception of sound by the human auditory system. It is the branch of science studying the psychological responses associated with sound including noise, speech, ...
model is applied and another MDCT filter is performed on the output. Part 3 quantifies and encodes each sample, known as noise allocation, which adjusts itself to meet the bit rate and
sound masking requirements. Part 4 formats the
bitstream
A bitstream (or bit stream), also known as binary sequence, is a sequence of bits.
A bytestream is a sequence of bytes. Typically, each byte is an 8-bit quantity, and so the term octet stream is sometimes used interchangeably. An octet may ...
, called an audio frame, which is made up of 4 parts, the
header,
error check,
audio data, and
ancillary data
Ancillary data is data that has been added to given data and uses the same form of transport. Common examples are cover art images for media files or streams, or digital data added to radio or television broadcasts.
Television
Ancillary data (co ...
.
The
MPEG-1
MPEG-1 is a Technical standard, standard for lossy compression of video and Audio frequency, audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s (26:1 and 6:1 compression ratios respectively ...
standard does not include a precise specification for an MP3 encoder but does provide examples of psychoacoustic models, rate loops, and the like in the non-normative part of the original standard.
MPEG-2 doubles the number of sampling rates that are supported and MPEG-2.5 adds 3 more. When this was written, the suggested implementations were quite dated. Implementers of the standard were supposed to devise algorithms suitable for removing parts of the information from the audio input. As a result, many different MP3 encoders became available, each producing files of differing quality. Comparisons were widely available, so it was easy for a prospective user of an encoder to research the best choice. Some encoders that were proficient at encoding at higher bit rates (such as
LAME
LAME is a software encoder that converts digital audio into the MP3 audio coding format. LAME is a free software project that was first released in 1998 and has incorporated many improvements since then, including an improved psychoacoustic ...
) were not necessarily as good at lower bit rates. Over time, LAME evolved on the SourceForge website until it became the de facto CBR MP3 encoder. Later an ABR mode was added. Work progressed on true
variable bit rate
Variable bitrate (VBR) is a term used in telecommunications and computing that relates to the bitrate used in sound or video encoding. As opposed to constant bitrate (CBR), VBR files vary the amount of output data per time segment. VBR allows ...
using a quality goal between 0 and 10. Eventually, numbers (such as -V 9.600) could generate excellent quality low bit rate voice encoding at only using the MPEG-2.5 extensions.
MP3 uses an overlapping MDCT structure. Each MPEG-1 MP3 frame is 1152 samples, divided into two granules of 576 samples. These samples, initially in the time domain, are transformed in one block to 576
frequency-domain samples by MDCT. MP3 also allows the use of shorter blocks in a granule, down to a size of 192 samples; this feature is used when a
transient is detected. Doing so limits the temporal spread of quantization noise accompanying the transient (see
psychoacoustics
Psychoacoustics is the branch of psychophysics involving the scientific study of the perception of sound by the human auditory system. It is the branch of science studying the psychological responses associated with sound including noise, speech, ...
). Frequency resolution is limited by the small long block window size, which decreases coding efficiency.
Time resolution can be too low for highly transient signals and may cause smearing of percussive sounds.
Due to the tree structure of the filter bank, pre-echo problems are made worse, as the combined impulse response of the two filter banks does not, and cannot, provide an optimum solution in time/frequency resolution.
Additionally, the combining of the two filter banks' outputs creates aliasing problems that must be handled partially by the "aliasing compensation" stage; however, that creates excess energy to be coded in the frequency domain, thereby decreasing coding efficiency.
Decoding, on the other hand, is carefully defined in the standard. Most
decoders are "
bitstream
A bitstream (or bit stream), also known as binary sequence, is a sequence of bits.
A bytestream is a sequence of bytes. Typically, each byte is an 8-bit quantity, and so the term octet stream is sometimes used interchangeably. An octet may ...
compliant", which means that the decompressed output that they produce from a given MP3 file will be the same, within a specified degree of
rounding
Rounding or rounding off is the process of adjusting a number to an approximate, more convenient value, often with a shorter or simpler representation. For example, replacing $ with $, the fraction 312/937 with 1/3, or the expression √2 with ...
tolerance, as the output specified mathematically in the ISO/IEC high standard document (ISO/IEC 11172-3). Therefore, the comparison of decoders is usually based on how computationally efficient they are (i.e., how much
memory
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...
or
CPU time they use in the decoding process). Over time this concern has become less of an issue as
CPU clock rates transitioned from MHz to GHz. Encoder/decoder overall delay is not defined, which means there is no official provision for
gapless playback
Gapless playback is the uninterrupted playback of consecutive audio tracks, such that relative time distances in the original audio source are preserved over track boundaries on playback. For this to be useful, other artifacts (than timing-related ...
. However, some encoders such as LAME can attach additional metadata that will allow players that can handle it to deliver seamless playback.
Quality
When performing lossy audio encoding, such as creating an MP3 data stream, there is a trade-off between the amount of data generated and the sound quality of the results. The person generating an MP3 selects a bit rate, which specifies how many
kilobits per second of audio is desired. The higher the bit rate, the larger the MP3 data stream will be, and, generally, the closer it will sound to the original recording. With too low a bit rate,
compression artifact
A compression artifact (or artefact) is a noticeable distortion of media (including Image, images, Sound recording, audio, and video) caused by the application of lossy compression. Lossy data compression involves discarding some of the medi ...
s (i.e., sounds that were not present in the original recording) may be audible in the reproduction. Some audio is hard to compress because of its randomness and sharp attacks. When this type of audio is compressed, artifacts such as ringing or
pre-echo are usually heard. A sample of applause or a
triangle instrument with a relatively low bit rate provides good examples of compression artifacts. Most subjective testings of perceptual codecs tend to avoid using these types of sound materials, however, the artifacts generated by percussive sounds are barely perceptible due to the specific temporal masking feature of the 32 sub-band filterbank of Layer II on which the format is based.
Besides the bit rate of an encoded piece of audio, the quality of MP3-encoded sound also depends on the quality of the encoder algorithm as well as the complexity of the signal being encoded. As the MP3 standard allows quite a bit of freedom with encoding algorithms, different encoders do feature quite different quality, even with identical bit rates. As an example, in a public listening test featuring two early MP3 encoders set at about ,
one scored 3.66 on a 1–5 scale, while the other scored only 2.22. Quality is dependent on the choice of encoder and encoding parameters.
This observation caused a revolution in audio encoding. Early on bit rate was the prime and only consideration. At the time MP3 files were of the very simplest type: they used the same bit rate for the entire file: this process is known as
constant bit rate
Constant bitrate (CBR) is a term used in telecommunications, relating to the quality of service. Compare with variable bitrate.
When referring to codecs, constant bit rate encoding means that the rate at which a codec's output data should be cons ...
(CBR) encoding. Using a constant bit rate makes encoding simpler and less CPU-intensive. However, it is also possible to optimize the size of the file by creating files where the bit rate changes throughout the file. These are known as variable bit rate. The bit reservoir and VBR encoding were part of the original MPEG-1 standard. The concept behind them is that, in any piece of audio, some sections are easier to compress, such as silence or music containing only a few tones, while others will be more difficult to compress. So, the overall quality of the file may be increased by using a lower bit rate for the less complex passages and a higher one for the more complex parts. With some advanced MP3 encoders, it is possible to specify a given quality, and the encoder will adjust the bit rate accordingly. Users that desire a particular "quality setting" that is
transparent to their ears can use this value when encoding all of their music, and generally speaking not need to worry about performing personal listening tests on each piece of music to determine the correct bit rate.
Perceived quality can be influenced by the listening environment (ambient noise), listener attention, listener training, and in most cases by listener audio equipment (such as sound cards, speakers, and headphones). Furthermore, sufficient quality may be achieved by a lesser quality setting for lectures and human speech applications and reduces encoding time and complexity. A test given to new students by
Stanford University
Leland Stanford Junior University, commonly referred to as Stanford University, is a Private university, private research university in Stanford, California, United States. It was founded in 1885 by railroad magnate Leland Stanford (the eighth ...
Music Professor Jonathan Berger showed that student preference for MP3-quality music has risen each year. Berger said the students seem to prefer the 'sizzle' sounds that MP3s bring to music.
An in-depth study of MP3 audio quality, sound artist and composer
Ryan Maguire's project "The Ghost in the MP3" isolates the sounds lost during MP3 compression. In 2015, he released the track "moDernisT" (an anagram of "Tom's Diner"), composed exclusively from the sounds deleted during MP3 compression of the song "Tom's Diner",
the track originally used in the formulation of the MP3 standard. A detailed account of the techniques used to isolate the sounds deleted during MP3 compression, along with the conceptual motivation for the project, was published in the 2014 Proceedings of the International Computer Music Conference.
Bit rate
Bit rate is the product of the sample rate and number of bits per sample used to encode the music. CD audio is 44100 samples per second. The number of bits per sample also depends on the number of audio channels. The CD is stereo and 16 bits per channel. So, multiplying 44100 by 32 gives 1411200—the bit rate of uncompressed CD digital audio. MP3 was designed to encode this data at or less. If less complex passages are detected by the MP3 algorithms then lower bit rates may be employed. When using MPEG-2 instead of MPEG-1, MP3 supports only lower sampling rates (16,000, 22,050, or 24,000 samples per second) and offers choices of bit rate as low as but no higher than . By lowering the sampling rate, MPEG-2 layer III removes all frequencies above half the new sampling rate that may have been present in the source audio.
As shown in these two tables, 14 selected bit rates are allowed in MPEG-1 Audio Layer III standard: 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256 and , along with the 3 highest available sampling rates of 32, 44.1 and 48
kHz
The hertz (symbol: Hz) is the unit of frequency in the International System of Units (SI), often described as being equivalent to one event (or cycle) per second. The hertz is an SI derived unit whose formal expression in terms of SI base uni ...
.
MPEG-2 Audio Layer III also allows 14 somewhat different (and mostly lower) bit rates of 8, 16, 24, 32, 40, 48, 56, 64, 80, 96, 112, 128, 144, with sampling rates of 16, 22.05 and 24
kHz
The hertz (symbol: Hz) is the unit of frequency in the International System of Units (SI), often described as being equivalent to one event (or cycle) per second. The hertz is an SI derived unit whose formal expression in terms of SI base uni ...
which are exactly half that of MPEG-1.
MPEG-2.5 Audio Layer III frames are limited to only 8 bit rates of 8, 16, 24, 32, 40, 48, 56 and with 3 even lower sampling rates of 8, 11.025, and 12 kHz. On earlier systems that only support the MPEG-1 Audio Layer III standard, MP3 files with a bit rate below might be played back sped-up and pitched-up.
Earlier systems also lack
fast forwarding and rewinding playback controls on MP3.
MPEG-1 frames contain the most detail in mode, the highest allowable bit rate setting, with silence and simple tones still requiring . MPEG-2 frames can capture up to 12 kHz sound reproductions needed up to . MP3 files made with MPEG-2 do not have 20 kHz bandwidth because of the
Nyquist–Shannon sampling theorem
The Nyquist–Shannon sampling theorem is an essential principle for digital signal processing linking the frequency range of a signal and the sample rate required to avoid a type of distortion called aliasing. The theorem states that the sample r ...
. Frequency reproduction is always strictly less than half of the sampling rate, and imperfect filters require a larger margin for error (noise level versus sharpness of filter), so an 8 kHz sampling rate limits the maximum frequency to 4 kHz, while a 48 kHz sampling rate limits an MP3 to a maximum 24 kHz sound reproduction. MPEG-2 uses half and MPEG-2.5 only a quarter of MPEG-1 sample rates.
For the general field of human speech reproduction, a bandwidth of 5,512 Hz is sufficient to produce excellent results (for voice) using the sampling rate of 11,025 and VBR encoding from 44,100 (standard) WAV file. English speakers average 41– with -V 9.6 setting but this may vary with the amount of silence recorded or the rate of delivery (wpm). Resampling to 12,000 (6K bandwidth) is selected by the LAME parameter -V 9.4. Likewise -V 9.2 selects a 16,000 sample rate and a resultant 8K lowpass filtering. Older versions of LAME and FFmpeg only support integer arguments for the variable bit rate quality selection parameter. The n.nnn quality parameter (-V) is documented at lame.sourceforge.net but is only supported in LAME with the new style VBR variable bit rate quality selector—not average bit rate (ABR).
A sample rate of 44.1 kHz is commonly used for music reproduction because this is also used for
CD audio
Compact Disc Digital Audio (CDDA or CD-DA), also known as Digital Audio Compact Disc or simply as Audio CD, is the standard format for audio compact discs. The standard is defined in the '' Red Book'' technical specifications, which is why t ...
, the main source used for creating MP3 files. A great variety of bit rates are used on the Internet. A bit rate of is commonly used,
at a compression ratio of 11:1, offering adequate audio quality in a relatively small space. As Internet
bandwidth
Bandwidth commonly refers to:
* Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range
* Bandwidth (computing), the rate of data transfer, bit rate or thr ...
availability and hard drive sizes have increased, higher bit rates up to are widespread. Uncompressed audio as stored on an audio-CD has a bit rate of , (16 bit/sample × 44,100 samples/second × 2 channels / 1,000 bits/kilobit), so the bit rates 128, 160, and represent
compression ratios of approximately 11:1, 9:1 and 7:1 respectively.
Non-standard bit rates up to can be achieved with the
LAME
LAME is a software encoder that converts digital audio into the MP3 audio coding format. LAME is a free software project that was first released in 1998 and has incorporated many improvements since then, including an improved psychoacoustic ...
encoder and the free format option, although few MP3 players can play those files. According to the ISO standard, decoders are only required to be able to decode streams up to .
Early MPEG Layer III encoders used what is now called
constant bit rate
Constant bitrate (CBR) is a term used in telecommunications, relating to the quality of service. Compare with variable bitrate.
When referring to codecs, constant bit rate encoding means that the rate at which a codec's output data should be cons ...
(CBR). The software was only able to use a uniform bit rate on all frames in an MP3 file. Later more sophisticated MP3 encoders were able to use the bit reservoir to target an
average bit rate selecting the encoding rate for each frame based on the complexity of the sound in that portion of the recording.
A more sophisticated MP3 encoder can produce variable bit rate audio. MPEG audio may use bit rate switching on a per-frame basis, but only layer III decoders must support it.
VBR is used when the goal is to achieve a fixed level of quality. The final file size of a VBR encoding is less predictable than with constant bit rate. Average bit rate is a type of VBR implemented as a compromise between the two: the bit rate is allowed to vary for more consistent quality, but is controlled to remain near an average value chosen by the user, for predictable file sizes. Although an MP3 decoder must support VBR to be standards compliant, historically some decoders have bugs with VBR decoding, particularly before VBR encoders became widespread. The most evolved LAME MP3 encoder supports the generation of VBR, ABR, and even the older CBR MP3 formats.
Layer III audio can also use a "bit reservoir", a partially full frame's ability to hold part of the next frame's audio data, allowing temporary changes in effective bit rate, even in a constant bit rate stream.
Internal handling of the bit reservoir increases encoding delay. There is no scale factor band 21 (sfb21) for frequencies above approx 16
kHz
The hertz (symbol: Hz) is the unit of frequency in the International System of Units (SI), often described as being equivalent to one event (or cycle) per second. The hertz is an SI derived unit whose formal expression in terms of SI base uni ...
, forcing the encoder to choose between less accurate representation in band 21 or less efficient storage in all bands below band 21, the latter resulting in wasted bit rate in VBR encoding.
Ancillary data
The ancillary data field can be used to store user-defined data. The ancillary data is optional and the number of bits available is not explicitly given. The ancillary data is located after the Huffman code bits and ranges to where the next frame's main_data_begin points to. Encoder
mp3PRO used ancillary data to encode extra information which could improve audio quality when decoded with its algorithm.
Metadata
A "tag" in an audio file is a section of the file that contains
metadata
Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive ...
such as the title, artist, album, track number, or other information about the file's contents. The MP3 standards do not define tag formats for MP3 files, nor is there a standard
container format
A container format (informally, sometimes called a wrapper) or metafile is a file format that allows multiple data streams to be embedded into a single file, usually along with metadata for identifying and further detailing those streams. Nota ...
that would support metadata and obviate the need for tags. However, several ''de facto'' standards for tag formats exist. As of 2010, the most widespread are
ID3v1 and ID3v2, and the more recently introduced
APEv2. These tags are normally embedded at the beginning or end of MP3 files, separate from the actual MP3 frame data. MP3 decoders either extract information from the tags or just treat them as ignorable, non-MP3 junk data.
Playing and editing software often contains tag editing functionality, but there are also
tag editor applications dedicated to the purpose. Aside from metadata about the audio content, tags may also be used for
DRM.
ReplayGain
ReplayGain is a proposed technical standard published by David Robinson in 2001 to measure and normalize the perceived loudness of audio in computer audio formats such as MP3 and Ogg Vorbis. It allows media players to normalize loudness for in ...
is a standard for measuring and storing the loudness of an MP3 file (
audio normalization
Audio normalization is the application of a constant amount of gain to an audio recording to bring the amplitude to a target level (the norm). Because the same amount of gain is applied across the entire recording, the signal-to-noise ratio and ...
) in its metadata tag, enabling a ReplayGain-compliant player to automatically adjust the overall playback volume for each file.
MP3Gain may be used to reversibly modify files based on ReplayGain measurements so that adjusted playback can be achieved on players without ReplayGain capability.
Licensing, ownership, and legislation
The basic MP3 decoding and encoding technology is patent-free in the European Union, all patents having expired there by 2012 at the latest. In the United States, the technology became substantially patent-free on 16 April 2017 (see below). MP3 patents expired in the US between 2007 and 2017. In the past, many organizations have claimed ownership of
patent
A patent is a type of intellectual property that gives its owner the legal right to exclude others from making, using, or selling an invention for a limited period of time in exchange for publishing an sufficiency of disclosure, enabling discl ...
s related to MP3 decoding or encoding. These claims led to several legal threats and actions from a variety of sources. As a result, in countries that allow
software patent
A software patent is a patent on a piece of software, such as a computer program, library, user interface, or algorithm. The validity of these patents can be difficult to evaluate, as software is often at once a product of engineering, something ...
s, uncertainty about which patents must have been licensed to create MP3 products without committing patent infringement was common in the early stages of the technology's adoption.
The initial near-complete MPEG-1 standard (parts 1, 2, and 3) was publicly available on 6 December 1991 as ISO CD 11172.
In most countries, patents cannot be filed after
prior art
Prior art (also known as state of the art or background art) is a concept in patent law used to determine the patentability of an invention, in particular whether an invention meets the novelty and the inventive step or non-obviousness criteria f ...
has been made public, and patents expire 20 years after the initial filing date, which can be up to 12 months later for filings in other countries. As a result, patents required to implement MP3 expired in most countries by December 2012, 21 years after the publication of ISO CD 11172.
An exception is the United States, where patents in force but filed before 8 June 1995 expire after the later of 17 years from the issue date or 20 years from the priority date. A lengthy patent prosecution process may result in a patent issued much later than normally expected (see
submarine patent
A submarine patent is a patent whose issuance and publication are intentionally delayed by the applicant for an artificially long pendency, which can be several years, or a decade. s). The various MP3-related patents expired on dates ranging from 2007 to 2017 in the United States.
Patents for anything disclosed in ISO CD 11172 filed a year or more after its publication are questionable. If only the known MP3 patents filed by December 1992 are considered, then MP3 decoding has been patent-free in the US since 22 September 2015, when , which had a PCT filing in October 1992, expired.
If the longest-running patent mentioned in the aforementioned references is taken as a measure, then the MP3 technology became patent-free in the United States on 16 April 2017, when , held and administered by
Technicolor
Technicolor is a family of Color motion picture film, color motion picture processes. The first version, Process 1, was introduced in 1916, and improved versions followed over several decades.
Definitive Technicolor movies using three black-and ...
, expired. As a result, many
free and open-source software
Free and open-source software (FOSS) is software available under a license that grants users the right to use, modify, and distribute the software modified or not to everyone free of charge. FOSS is an inclusive umbrella term encompassing free ...
projects, such as the
Fedora operating system, have decided to start shipping MP3 support by default, and users will no longer have to resort to installing unofficial packages maintained by third party software repositories for MP3 playback or encoding.
Technicolor
Technicolor is a family of Color motion picture film, color motion picture processes. The first version, Process 1, was introduced in 1916, and improved versions followed over several decades.
Definitive Technicolor movies using three black-and ...
(formerly called Thomson Consumer Electronics) claimed to control MP3 licensing of the Layer 3 patents in many countries, including the United States, Japan, Canada, and EU countries.
Technicolor had been actively enforcing these patents.
MP3 license revenues from Technicolor's administration generated about €100 million for the Fraunhofer Society in 2005.
In September 1998, the Fraunhofer Institute sent a letter to several developers of MP3 software stating that a license was required to "distribute and/or sell decoders and/or encoders". The letter claimed that unlicensed products "infringe the patent rights of Fraunhofer and Thomson. To make, sell or distribute products using the
PEG Layer-3standard and thus our patents, you need to obtain a license under these patents from us."
This led to the situation where the
LAME
LAME is a software encoder that converts digital audio into the MP3 audio coding format. LAME is a free software project that was first released in 1998 and has incorporated many improvements since then, including an improved psychoacoustic ...
MP3 encoder project could not offer its users official binaries that could run on their computer. The project's position was that as source code, LAME was simply a description of how an MP3 encoder ''could'' be implemented. Unofficially, compiled binaries were available from other sources.
Sisvel S.p.A., a Luxembourg-based company, administers licenses for patents applying to MPEG Audio. They, along with its United States subsidiary Audio MPEG, Inc. previously sued Thomson for patent infringement on MP3 technology,
but those disputes were resolved in November 2005 with Sisvel granting Thomson a license to their patents. Motorola followed soon after and signed with Sisvel to license MP3-related patents in December 2005.
Except for three patents, the US patents administered by Sisvel had all expired in 2015. The three exceptions are: , expired February 2017; , expired February 2017; and , expired 9 April 2017. As of around the first quarter of 2023, Sisvel's licensing program has become a legacy.
In September 2006, German officials seized MP3 players from
SanDisk's booth at the
IFA show in Berlin after an Italian patents firm won an injunction on behalf of Sisvel against SanDisk in a dispute over licensing rights. The injunction was later reversed by a Berlin judge,
but that reversal was in turn blocked the same day by another judge from the same court, "bringing the Patent Wild West to Germany" in the words of one commentator.
In February 2007, Texas MP3 Technologies sued Apple, Samsung Electronics and Sandisk in
eastern Texas federal court, claiming infringement of a portable MP3 player patent that Texas MP3 said it had been assigned. Apple, Samsung, and Sandisk all settled the claims against them in January 2009.
Alcatel-Lucent
Alcatel-Lucent S.A. () was a multinational telecommunications equipment company, headquartered in Boulogne-Billancourt, Paris, France. The company focused on Fixed line telephone, fixed, Mobile phone, mobile and telecommunications convergence, ...
has asserted several MP3 coding and compression patents, allegedly inherited from AT&T-Bell Labs, in litigation of its own. In November 2006, before the companies' merger,
Alcatel sued Microsoft for allegedly infringing seven patents. On 23 February 2007, a San Diego jury awarded Alcatel-Lucent US $1.52 billion in damages for infringement of two of them.
The court subsequently revoked the award, however, finding that one patent had not been infringed and that the other was not owned by Alcatel-Lucent; it was co-owned by
AT&T
AT&T Inc., an abbreviation for its predecessor's former name, the American Telephone and Telegraph Company, is an American multinational telecommunications holding company headquartered at Whitacre Tower in Downtown Dallas, Texas. It is the w ...
and Fraunhofer, who had licensed it to
Microsoft
Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
, the judge ruled.
That defense judgment was upheld on appeal in 2008.
Alternative technologies
Other lossy formats exist. Among these,
Advanced Audio Coding
Advanced Audio Coding (AAC) is an audio coding standard for lossy digital audio compression. It was developed by Dolby, AT&T, Fraunhofer and Sony, originally as part of the MPEG-2 specification but later improved under MPEG-4.ISO (2006ISO/ ...
(AAC) is the most widely used, and was designed to be the successor to MP3. There also exist other lossy formats such as
mp3PRO and
MP2. They are members of the same technological family as MP3 and depend on roughly similar
psychoacoustic models and MDCT algorithms. Whereas MP3 uses a hybrid coding approach that is part MDCT and part
FFT, AAC is purely MDCT, significantly improving compression efficiency.
Many of the basic
patent
A patent is a type of intellectual property that gives its owner the legal right to exclude others from making, using, or selling an invention for a limited period of time in exchange for publishing an sufficiency of disclosure, enabling discl ...
s underlying these formats are held by Fraunhofer Society, Alcatel-Lucent,
Thomson Consumer Electronics
Vantiva SA (formerly Technicolor SA, Thomson SARL, Thomson SA, and Thomson Multimedia) is a French multinational corporation that provides technology products and services for the communication, media and entertainment industries. Headquarter ...
,
Bell
A bell /ˈbɛl/ () is a directly struck idiophone percussion instrument. Most bells have the shape of a hollow cup that when struck vibrates in a single strong strike tone, with its sides forming an efficient resonator. The strike may be m ...
,
Dolby
Dolby Laboratories, Inc. (Dolby Labs or simply Dolby) is a British-American technology corporation specializing in audio noise reduction, audio encoding/compression, spatial audio, and high-dynamic-range television (HDR) imaging. Dolby li ...
,
LG Electronics
LG Electronics Inc. () is a South Korean Multinational corporation, multinational major appliance and consumer electronics corporation headquartered in Yeouido-dong, Seoul, South Korea. LG Electronics is a part of LG, LG Corporation, the fourth ...
,
NEC
is a Japanese multinational information technology and electronics corporation, headquartered at the NEC Supertower in Minato, Tokyo, Japan. It provides IT and network solutions, including cloud computing, artificial intelligence (AI), Inte ...
,
NTT Docomo,
Panasonic
is a Japanese multinational electronics manufacturer, headquartered in Kadoma, Osaka, Kadoma, Japan. It was founded in 1918 as in Fukushima-ku, Osaka, Fukushima by Kōnosuke Matsushita. The company was incorporated in 1935 and renamed and c ...
,
Sony Corporation
is a Japanese multinational conglomerate headquartered at Sony City in Minato, Tokyo, Japan. The Sony Group encompasses various businesses, including Sony Corporation (electronics), Sony Semiconductor Solutions (imaging and sensing), ...
,
ETRI,
JVC Kenwood,
Philips
Koninklijke Philips N.V. (), simply branded Philips, is a Dutch multinational health technology company that was founded in Eindhoven in 1891. Since 1997, its world headquarters have been situated in Amsterdam, though the Benelux headquarter ...
,
Microsoft
Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
, and
NTT.
Microsoft
Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
created and promoted their own competing standard,
Windows Media Audio
Windows Media Audio (WMA) is a series of audio codecs and their corresponding audio coding formats developed by Microsoft. It is a proprietary technology that forms part of the Windows Media framework. Audio encoded in WMA is stored in a digi ...
(WMA) with the claim that it is better than MP3. When the digital audio player market was taking off, MP3 was widely adopted as the standard hence the popular name "MP3 player". Sony was an exception and used their own
ATRAC
Adaptive Transform Acoustic Coding (ATRAC) is a family of proprietary audio compression algorithms developed by Sony. MiniDisc was the first commercial product to incorporate ATRAC, in 1992. ATRAC allowed a relatively small disc like MiniDisc t ...
codec taken from their
MiniDisc
MiniDisc (MD) is an erasable magneto-optical disc-based data storage format offering a capacity of 60, 74, or 80 minutes of digitized audio.
Sony announced the MiniDisc in September 1992 and released it in November of that year for sale i ...
format, which Sony claimed was better. Following criticism and lower than expected
Walkman
is a brand of Personal stereo, portable audio players manufactured by Sony since 1979. It was originally introduced as a portable Compact Cassette, cassette player and later expanded to include a range of portable audio products. Since 2011, ...
sales, in 2004 Sony for the first time introduced native MP3 support to its Walkman players.
There are also open compression formats like
Opus and
Vorbis (OGG) that are available free of charge and without any known patent restrictions. Some of the newer audio compression formats, such as AAC, WMA Pro, Vorbis, and Opus, are free of some limitations inherent to the MP3 format that cannot be overcome by any MP3 encoder.
Besides lossy compression methods,
lossless formats are a significant alternative to MP3 because they provide unaltered audio content, though with an increased file size compared to lossy compression. Lossless formats include
FLAC
FLAC (; Free Lossless Audio Codec) is an audio coding format for lossless compression of digital audio, developed by the Xiph.Org Foundation, and is also the name of the free software project producing the FLAC tools, the reference software ...
(Free Lossless Audio Codec),
Apple Lossless
The Apple Lossless Audio Codec (ALAC, ), also known as Apple Lossless, or Apple Lossless Encoder (ALE), is an audio coding format, and its reference audio codec implementation, developed by Apple Inc., Apple for lossless data compression of digit ...
and many others.
See also
*
MP3 Surround
*
Windows Media Audio
Windows Media Audio (WMA) is a series of audio codecs and their corresponding audio coding formats developed by Microsoft. It is a proprietary technology that forms part of the Windows Media framework. Audio encoded in WMA is stored in a digi ...
(WMA)
*
Comparison of audio coding formats
The following tables compare general and technical information for a variety of audio coding formats.
For listening tests comparing the perceived audio quality of audio formats and codecs, see the article Codec listening test.
General informati ...
References
Further reading
*
External links
MP3-history.com, The Story of MP3: How MP3 was invented, by Fraunhofer IIS.
– over 1000 articles from 1999 to 2011 focused on MP3 and digital audio.
MPEG.chiariglione.org – MPEG official website
{{Authority control
Computer-related introductions in 1993
Audio codecs
Data compression
MPEG
Technicolor SA