MPEG-1 is a

standard Standard may refer to: Symbols * Colours, standards and guidons, kinds of military signs * Standard (emblem), a type of a large symbol or emblem used for identification Norms, conventions or requirements * Standard (metrology), an object th ...

for

lossy In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data si ...

compression of

video Video is an electronic medium for the recording, copying, playback, broadcasting, and display of moving visual media. Video was first developed for mechanical television systems, which were quickly replaced by cathode-ray tube (CRT) sy ...

and

audio Audio most commonly refers to sound, as it is transmitted in signal form. It may also refer to: Sound *Audio signal, an electrical representation of sound *Audio frequency, a frequency in the audio spectrum * Digital audio, representation of sou ...

. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s (26:1 and 6:1 compression ratios respectively) without excessive quality loss, making

video CD Video CD (abbreviated as VCD, and also known as Compact Disc Digital Video) is a home video format and the first format for distributing films on standard optical discs. The format was widely adopted in Southeast Asia, Central Asia and the ...

s, digital

cable Cable may refer to: Mechanical * Nautical cable, an assembly of three or more ropes woven against the weave of the ropes, rendering it virtually waterproof * Wire rope, a type of rope that consists of several strands of metal wire laid into a hel ...

satellite A satellite or artificial satellite is an object intentionally placed into orbit in outer space. Except for passive satellites, most satellites have an electricity generation system for equipment on board, such as solar panels or radioiso ...

TV and

digital audio broadcasting Digital radio is the use of digital technology to transmit or receive across the radio spectrum. Digital transmission by radio waves includes digital broadcasting, and especially digital audio radio services. Types In digital broadcasting ...

(DAB) practical. Today, MPEG-1 has become the most widely compatible lossy audio/video format in the world, and is used in a large number of products and technologies. Perhaps the best-known part of the MPEG-1 standard is the first version of the

MP3 MP3 (formally MPEG-1 Audio Layer III or MPEG-2 Audio Layer III) is a coding format for digital audio developed largely by the Fraunhofer Society in Germany, with support from other digital scientists in the United States and elsewhere. Origin ...

audio format it introduced. The MPEG-1 standard is published as ISO/

IEC The International Electrotechnical Commission (IEC; in French: ''Commission électrotechnique internationale'') is an international standards organization that prepares and publishes international standards for all electrical, electronic and r ...

11172 – Information technology—Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s. The standard consists of the following five ''Parts'': #Systems (storage and synchronization of video, audio, and other data together) #Video (compressed video content) #Audio (compressed audio content) #Conformance testing (testing the correctness of implementations of the standard) #Reference software (example software showing how to encode and decode according to the standard)

History

The predecessor of MPEG-1 for video coding was the H.261 standard produced by the

CCITT The ITU Telecommunication Standardization Sector (ITU-T) is one of the three sectors (divisions or units) of the International Telecommunication Union (ITU). It is responsible for coordinating standards for telecommunications and Information Comm ...

(now known as the

ITU-T The ITU Telecommunication Standardization Sector (ITU-T) is one of the three sectors (divisions or units) of the International Telecommunication Union (ITU). It is responsible for coordinating standards for telecommunications and Information Co ...

). The basic architecture established in H.261 was the motion-compensated DCT hybrid video coding structure. It uses macroblocks of size 16×16 with block-based

motion estimation Motion estimation is the process of determining ''motion vectors'' that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions ...

in the encoder and

motion compensation Motion compensation in computing, is an algorithmic technique used to predict a frame in a video, given the previous and/or future frames by accounting for motion of the camera and/or objects in the video. It is employed in the encoding of video d ...

using encoder-selected

motion vector Motion estimation is the process of determining ''motion vectors'' that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions ...

s in the decoder, with residual difference coding using a discrete cosine transform (DCT) of size 8×8, scalar quantization, and variable-length codes (like Huffman codes) for entropy coding. H.261 was the first practical video coding standard, and all of its described design elements were also used in MPEG-1. Modeled on the successful collaborative approach and the compression technologies developed by the

Joint Photographic Experts Group The Joint Photographic Experts Group (JPEG) is the joint committee between ISO/IEC JTC 1/SC 29 and ITU-T Study Group 16 that created and maintains the JPEG, JPEG 2000, JPEG XR, JPEG XT, JPEG XS, JPEG XL, and related digital image standards. I ...

and

's Experts Group on Telephony (creators of the

JPEG JPEG ( ) is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and imag ...

image compression standard and the H.261 standard for

video conferencing Videotelephony, also known as videoconferencing and video teleconferencing, is the two-way or multipoint reception and transmission of audio signal, audio and video signals by people in different locations for Real-time, real time communication. ...

respectively), the

Moving Picture Experts Group The Moving Picture Experts Group (MPEG) is an alliance of working groups established jointly by ISO and IEC that sets standards for media coding, including compression coding of audio, video, graphics, and genomic data; and transmission and f ...

(MPEG) working group was established in January 1988, by the initiative of

Hiroshi Yasuda Hiroshi Yasuda (born 1944) is an Emeritus Professor at the University of Tokyo and works as a Consultant for Nippon Telegraph and Telephone. In the sphere of international standardization, together with Leonardo Chiariglione he founded the Movin ...

(

Nippon Telegraph and Telephone , commonly known as NTT, is a Japanese telecommunications company headquartered in Tokyo, Japan. Ranked 55th in ''Fortune'' Global 500, NTT is the fourth largest telecommunications company in the world in terms of revenue, as well as the third la ...

) and

Leonardo Chiariglione Leonardo Chiariglione () (born 30 January 1943 (age ) in Almese, Turin province, Piedmont, Italy) is an Italian engineer who has led the development of international technical standards for digital media. In particular, he was the chairman of t ...

(

CSELT Centro Studi e Laboratori Telecomunicazioni (CSELT) was an Italian research center for telecommunication based in Torino, the biggest in Italy and one of the most important in Europe. It played a major role internationally especially in the stand ...

). MPEG was formed to address the need for

video and audio formats, and to build on H.261 to get better quality through the use of somewhat more complex encoding methods (e.g., supporting higher precision for motion vectors). Development of the MPEG-1 standard began in May 1988. Fourteen video and fourteen audio codec proposals were submitted by individual companies and institutions for evaluation. The codecs were extensively tested for

computational complexity In computer science, the computational complexity or simply complexity of an algorithm is the amount of resources required to run it. Particular focus is given to computation time (generally measured by the number of needed elementary operations) ...

and subjective (human perceived) quality, at data rates of 1.5 Mbit/s. This specific bitrate was chosen for transmission over T-1/ E-1 lines and as the approximate data rate of audio CDs. The codecs that excelled in this testing were utilized as the basis for the standard and refined further, with additional features and other improvements being incorporated in the process. After 20 meetings of the full group in various cities around the world, and 4½ years of development and testing, the final standard (for parts 1–3) was approved in early November 1992 and published a few months later. The reported completion date of the MPEG-1 standard varies greatly: a largely complete draft standard was produced in September 1990, and from that point on, only minor changes were introduced. The draft standard was publicly available for purchase. The standard was finished with the 6 November 1992 meeting. The Berkeley Plateau Multimedia Research Group developed an MPEG-1 decoder in November 1992. In July 1990, before the first draft of the MPEG-1 standard had even been written, work began on a second standard,

MPEG-2 MPEG-2 (a.k.a. H.222/H.262 as was defined by the ITU) is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods, ...

, intended to extend MPEG-1 technology to provide full broadcast-quality video (as per

CCIR 601 ITU-R Recommendation BT.601, more commonly known by the abbreviations Rec. 601 or BT.601 (or its former name CCIR 601) is a standard originally issued in 1982 by the Comité consultatif international pour la radio, CCIR (an organization, ...

) at high bitrates (3–15 Mbit/s) and support for

interlaced Interlaced video (also known as interlaced scan) is a technique for doubling the perceived frame rate of a video display without consuming extra bandwidth. The interlaced signal contains two fields of a video frame captured consecutively. This ...

video. Due in part to the similarity between the two codecs, the MPEG-2 standard includes full backwards compatibility with MPEG-1 video, so any MPEG-2 decoder can play MPEG-1 videos. Notably, the MPEG-1 standard very strictly defines the

bitstream A bitstream (or bit stream), also known as binary sequence, is a sequence of bits. A bytestream is a sequence of bytes. Typically, each byte is an 8-bit quantity, and so the term octet stream is sometimes used interchangeably. An octet may ...

, and decoder function, but does not define how MPEG-1 encoding is to be performed, although a reference implementation is provided in ISO/IEC-11172-5. This means that MPEG-1 coding efficiency can drastically vary depending on the encoder used, and generally means that newer encoders perform significantly better than their predecessors. The first three parts (Systems, Video and Audio) of ISO/IEC 11172 were published in August 1993.

Patents

Due to its age, MPEG-1 is no longer covered by any essential patents and can thus be used without obtaining a licence or paying any fees. The ISO patent database lists one patent for ISO 11172, US 4,472,747, which expired in 2003. The near-complete draft of the MPEG-1 standard was publicly available as ISO CD 11172 by December 6, 1991. Reference 3 in the paper is to Committee Draft of Standard ISO/IEC 11172, December 6, 1991. Neither the July 2008 Kuro5hin article "Patent Status of MPEG-1, H.261 and MPEG-2", nor an August 2008 thread on the gstreamer-devel mailing list were able to list a single unexpired MPEG-1 Video and MPEG-1 Audio Layer I/II patent. A May 2009 discussion on the whatwg mailing list mentioned US 5,214,678 patent as possibly covering MPEG-1 Audio Layer II. Filed in 1990 and published in 1993, this patent is now expired. A full MPEG-1 decoder and encoder, with "Layer III audio", could not be implemented royalty free since there were companies that required patent fees for implementations of MPEG-1 Audio Layer III, as discussed in the

article. All patents in the world connected to MP3 expired 30 December 2017, which makes this format totally free for use. On 23 April 2017,

Fraunhofer IIS The Fraunhofer Society (german: Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V., lit=Fraunhofer Society for the Advancement of Applied Research) is a German research organization with 76institutes spread throughout Germany ...

stopped charging for Technicolor's MP3 licensing program for certain MP3 related patents and software.

Former patent holders

The following corporations filed declarations with ISO saying they held patents for the MPEG-1 Video (ISO/IEC-11172-2) format, although all such patents have since expired. * BBC * Daimler Benz AG *

Fujitsu is a Japanese multinational information and communications technology equipment and services corporation, established in 1935 and headquartered in Tokyo. Fujitsu is the world's sixth-largest IT services provider by annual revenue, and the la ...

* IBM * Matsushita Electric Industrial Co., Ltd. *

Mitsubishi Electric , established on 15 January 1921, is a Japanese multinational electronics and electrical equipment manufacturing company headquartered in Tokyo, Japan. It is one of the core companies of Mitsubishi. The products from MELCO include elevators an ...

NEC is a Japanese multinational information technology and electronics corporation, headquartered in Minato, Tokyo. The company was known as the Nippon Electric Company, Limited, before rebranding in 1983 as NEC. It provides IT and network soluti ...

NHK , also known as NHK, is a Japanese public broadcaster. NHK, which has always been known by this romanized initialism in Japanese, is a statutory corporation funded by viewers' payments of a television license fee. NHK operates two terrestr ...

Philips Koninklijke Philips N.V. (), commonly shortened to Philips, is a Dutch multinational conglomerate corporation that was founded in Eindhoven in 1891. Since 1997, it has been mostly headquartered in Amsterdam, though the Benelux headquarters is ...

Pioneer Corporation commonly referred to as Pioneer, is a Japanese multinational corporation based in Tokyo, that specializes in digital entertainment products. The company was founded by Nozomu Matsumoto in January 1, 1938 in Tokyo as a radio and speaker repair ...

Qualcomm Qualcomm () is an American multinational corporation headquartered in San Diego, California, and incorporated in Delaware. It creates semiconductors, software, and services related to wireless technology. It owns patents critical to the 5G, ...

Ricoh is a Japanese Multinational corporation, multinational imaging and electronics company (law), company. It was founded by the now-defunct commercial division of the Riken, Institute of Physical and Chemical Research (Riken) known as the ''Riken ...

Sony , commonly stylized as SONY, is a Japanese multinational conglomerate corporation headquartered in Minato, Tokyo, Japan. As a major technology company, it operates as one of the world's largest manufacturers of consumer and professional ...

Texas Instruments Texas Instruments Incorporated (TI) is an American technology company headquartered in Dallas, Texas, that designs and manufactures semiconductors and various integrated circuits, which it sells to electronics designers and manufacturers globa ...

Thomson Multimedia Vantiva SA, formerly Technicolor SA, Thomson SARL, and Thomson Multimedia, is a French multinational corporation that provides creative services and technology products for the communication, media and entertainment industries. Vantiva's headq ...

Toppan Printing or simply Toppan is a Japanese global printing company. Toppan was founded in 1900 and is headquartered in Tokyo. History As of March 2013 the company has 169 subsidiary and affiliate companies. Toppan is listed on the Tokyo Stock Exchange a ...

Toshiba , commonly known as Toshiba and stylized as TOSHIBA, is a Japanese multinational conglomerate corporation headquartered in Minato, Tokyo, Japan. Its diversified products and services include power, industrial and social infrastructure systems, ...

* Victor Company of Japan

Applications

*Most popular

software Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consist ...

for video playback includes MPEG-1 decoding, in addition to any other supported formats. *The popularity of

audio has established a massive

installed base Installed base (also install base, install ''ed'' user base or just user base) is a measure of the number of units of a product or service that are actually in use, as opposed to market share, which only reflects sales over a particular period. ...

of hardware that can play back MPEG-1 Audio (all three layers). *"Virtually all digital audio devices" can play back MPEG-1 Audio. Many millions have been sold to-date. *Before

became widespread, many digital satellite/cable TV services used MPEG-1 exclusively. *The widespread popularity of MPEG-2 with broadcasters means MPEG-1 is playable by most digital cable and satellite

set-top box A set-top box (STB), also colloquially known as a cable box and historically television decoder, is an information appliance device that generally contains a TV-tuner input and displays output to a television set and an external source of s ...

es, and digital disc and tape players, due to backwards compatibility. *MPEG-1 was used for full-screen video on Green Book

CD-i The Compact Disc-Interactive (CD-I, later CD-i) is a Digital media, digital optical disc data storage device, data storage format that was mostly developed and marketed by Dutch company Philips. It was created as an extension of Compact Disc Di ...

, and on

Video CD Video CD (abbreviated as VCD, and also known as Compact Disc Digital Video) is a home video format and the first format for distributing films on standard optical discs. The format was widely adopted in Southeast Asia, Central Asia and the ...

(VCD). *The

Super Video CD Super Video CD (Super Video Compact Disc or SVCD) is a digital format for storing video on standard compact discs. SVCD was intended as a successor to Video CD and an alternative to DVD-Video, and falls somewhere between both in terms of technica ...

standard, based on VCD, uses MPEG-1 audio exclusively, as well as MPEG-2 video. *The

DVD-Video DVD-Video is a consumer video format used to store digital video on DVD discs. DVD-Video was the dominant consumer home video format in Asia, North America, Europe, and Australia in the 2000s until it was supplanted by the high-definition Blu- ...

format uses MPEG-2 video primarily, but MPEG-1 support is explicitly defined in the standard. *The DVD-Video standard originally required MPEG-1 Audio Layer II for PAL countries, but was changed to allow AC-3/

Dolby Digital Dolby Digital, originally synonymous with Dolby AC-3, is the name for what has now become a family of audio compression technologies developed by Dolby Laboratories. Formerly named Dolby Stereo Digital until 1995, the audio compression is loss ...

-only discs. MPEG-1 Audio Layer II is still allowed on DVDs, although newer extensions to the format, like MPEG Multichannel, are rarely supported. *Most DVD players also support Video CD and

MP3 CD A compressed audio optical disc, MP3 CD, or MP3 CD-ROM or MP3 DVD is an optical disc (usually a CD-R, CD-RW, DVD-R or DVD-RW) that contains digital audio in the MP3 file format. Discs are written in the "Yellow Book" standard data format (used f ...

playback, which use MPEG-1. *The international

Digital Video Broadcasting Digital Video Broadcasting (DVB) is a set of international open standards for digital television. DVB standards are maintained by the DVB Project, an international industry consortium, and are published by a Joint Technical Committee (JTC) ...

(DVB) standard primarily uses MPEG-1 Audio Layer II, and MPEG-2 video. *The international

Digital Audio Broadcasting Digital radio is the use of digital technology to transmit or receive across the radio spectrum. Digital transmission by radio waves includes digital broadcasting, and especially digital audio radio services. Types In digital broadcasting ...

(DAB) standard uses MPEG-1 Audio Layer II exclusively, due to its especially high quality, modest decoder performance requirements, and tolerance of errors. *The

Digital Compact Cassette The Digital Compact Cassette (DCC) is a magnetic tape sound recording format introduced by Philips and Matsushita Electric in late and marketed as the successor to the standard analog Compact Cassette. It was also a direct competitor to Sony ...

uses PASC (Precision Adaptive Sub-band Coding) to encode its audio. PASC is an early version of MPEG-1 Audio Layer I with a fixed bit rate of 384 kilobits per second.

Part 1: Systems

Part 1 of the MPEG-1 standard covers ''systems'', and is defined in ISO/IEC-11172-1. MPEG-1 Systems specifies the logical layout and methods used to store the encoded audio, video, and other data into a standard bitstream, and to maintain synchronization between the different contents. This

file format A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free. Some file format ...

is specifically designed for storage on media, and transmission over

communication channel A communication channel refers either to a physical transmission medium such as a wire, or to a logical connection over a multiplexed medium such as a radio channel in telecommunications and computer networking. A channel is used for informa ...

s, that are considered relatively reliable. Only limited error protection is defined by the standard, and small errors in the bitstream may cause noticeable defects. This structure was later named an

MPEG program stream Program stream (PS or MPEG-PS) is a container format for multiplexing digital audio, video and more. The PS format is specified in MPEG-1 Part 1 (ISO/IEC 11172-1) and MPEG-2 Part 1, Systems (ISO/IEC standard 13818-1/ITU-T H.222.0). The MPEG-2 ...

: "The MPEG-1 Systems design is essentially identical to the MPEG-2 Program Stream structure." This terminology is more popular, precise (differentiates it from an

MPEG transport stream MPEG transport stream (MPEG-TS, MTS) or simply transport stream (TS) is a standard digital container format for transmission and storage of audio, video, and Program and System Information Protocol (PSIP) data. It is used in broadcast syste ...

) and will be used here.

Elementary streams, packets, and clock references

*Elementary Streams (ES) are the raw bitstreams of MPEG-1 audio and video encoded data (output from an encoder). These files can be distributed on their own, such as is the case with MP3 files. *Packetized Elementary Streams (PES) are elementary streams

packet Packet may refer to: * A small container or pouch ** Packet (container), a small single use container ** Cigarette packet ** Sugar packet * Network packet, a formatted unit of data carried by a packet-mode computer network * Packet radio, a fo ...

ized into packets of variable lengths, i.e., divided ES into independent chunks where

cyclic redundancy check A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to digital data. Blocks of data entering these systems get a short ''check value'' attached, based on ...

(CRC)

checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data ...

was added to each packet for error detection. *System Clock Reference (SCR) is a timing value stored in a 33-bit header of each PES, at a frequency/precision of 90 kHz, with an extra 9-bit extension that stores additional timing data with a precision of 27 MHz. These are inserted by the encoder, derived from the system time clock (STC). Simultaneously encoded audio and video streams will not have identical SCR values, however, due to buffering, encoding, jitter, and other delay.

Program streams

Program Streams (PS) are concerned with combining multiple packetized elementary streams (usually just one audio and video PES) into a single stream, ensuring simultaneous delivery, and maintaining synchronization. The PS structure is known as a

multiplex Multiplex may refer to: * Multiplex (automobile), a former American car make * Multiplex (comics), a DC comic book supervillain * Multiplex (company), a global contracting and development company * Multiplex (assay), a biological assay which measu ...

, or a

container format A container format (informally, sometimes called a wrapper) or metafile is a file format that allows multiple data streams to be embedded into a single file, usually along with metadata for identifying and further detailing those streams. No ...

. Presentation time stamps (PTS) exist in PS to correct the inevitable disparity between audio and video SCR values (time-base correction). 90 kHz PTS values in the PS header tell the decoder which video SCR values match which audio SCR values. PTS determines when to display a portion of an MPEG program, and is also used by the decoder to determine when data can be discarded from the buffer. Either video or audio will be delayed by the decoder until the corresponding segment of the other arrives and can be decoded. PTS handling can be problematic. Decoders must accept multiple ''program streams'' that have been concatenated (joined sequentially). This causes PTS values in the middle of the video to reset to zero, which then begin incrementing again. Such PTS wraparound disparities can cause timing issues that must be specially handled by the decoder. Decoding Time Stamps (DTS), additionally, are required because of B-frames. With B-frames in the video stream, adjacent frames have to be encoded and decoded out-of-order (re-ordered frames). DTS is quite similar to PTS, but instead of just handling sequential frames, it contains the proper time-stamps to tell the decoder when to decode and display the next B-frame (types of frames explained below), ahead of its anchor (P- or I-) frame. Without B-frames in the video, PTS and DTS values are identical.

Multiplexing

To generate the PS, the multiplexer will interleave the (two or more) packetized elementary streams. This is done so the packets of the simultaneous streams can be transferred over the same channel and are guaranteed to both arrive at the decoder at precisely the same time. This is a case of

time-division multiplexing Time-division multiplexing (TDM) is a method of transmitting and receiving independent signals over a common signal path by means of synchronized switches at each end of the transmission line so that each signal appears on the line only a fracti ...

. Determining how much data from each stream should be in each interleaved segment (the size of the interleave) is complicated, yet an important requirement. Improper interleaving will result in buffer underflows or overflows, as the receiver gets more of one stream than it can store (e.g. audio), before it gets enough data to decode the other simultaneous stream (e.g. video). The MPEG

Video Buffering Verifier The Video Buffering Verifier (VBV) is a theoretical MPEG video buffer model, used to ensure that an encoded video stream can be correctly buffered, and played back at the decoder device. By definition, the VBV shall not overflow nor underflow whe ...

(VBV) assists in determining if a multiplexed PS can be decoded by a device with a specified data throughput rate and buffer size. This offers feedback to the multiplexer and the encoder, so that they can change the multiplex size or adjust bitrates as needed for compliance.

Part 2: Video

Part 2 of the MPEG-1 standard covers video and is defined in ISO/IEC-11172-2. The design was heavily influenced by H.261. MPEG-1 Video exploits perceptual compression methods to significantly reduce the data rate required by a video stream. It reduces or completely discards information in certain frequencies and areas of the picture that the human eye has limited ability to fully perceive. It also exploits temporal (over time) and spatial (across a picture) redundancy common in video to achieve better data compression than would be possible otherwise. (See:

Video compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressio ...

)

Color space

Before encoding video to MPEG-1, the color-space is transformed to

Y′CbCr YCbCr, Y′CbCr, or Y Pb/Cb Pr/Cr, also written as YCBCR or Y′CBCR, is a family of color spaces used as a part of the color image pipeline in video and digital photography systems. Y′ is the luma component and CB and CR are the blue-di ...

(Y′=Luma, Cb=Chroma Blue, Cr=Chroma Red).

Luma Luma or LUMA may refer to: Arts * La Trobe University Museum of Art, Melbourne, Australia * LUMA Projection Arts Festival, an annual event featuring building-scale projection mapping and light installations in Binghamton, NY * LUMA Foundation, ...

(brightness, resolution) is stored separately from chroma (color, hue, phase) and even further separated into red and blue components. The chroma is also subsampled to 4:2:0, meaning it is reduced to half resolution vertically and half resolution horizontally, i.e., to just one quarter the number of samples used for the luma component of the video. This use of higher resolution for some color components is similar in concept to the Bayer pattern filter that is commonly used for the image capturing sensor in digital color cameras. Because the human eye is much more sensitive to small changes in brightness (the Y component) than in color (the Cr and Cb components),

chroma subsampling Chroma subsampling is the practice of encoding images by implementing less resolution for chroma information than for luma information, taking advantage of the human visual system's lower acuity for color differences than for luminance. It is u ...

is a very effective way to reduce the amount of video data that needs to be compressed. However, on videos with fine detail (high spatial complexity) this can manifest as chroma

aliasing In signal processing and related disciplines, aliasing is an effect that causes different signals to become indistinguishable (or ''aliases'' of one another) when sampled. It also often refers to the distortion or artifact that results when ...

artifacts. Compared to other digital

compression artifact A compression artifact (or artefact) is a noticeable distortion of media (including images, audio, and video) caused by the application of lossy compression. Lossy data compression involves discarding some of the media's data so that it beco ...

s, this issue seems to very rarely be a source of annoyance. Because of the subsampling, Y′CbCr 4:2:0 video is ordinarily stored using even dimensions (

divisible In mathematics, a divisor of an integer n, also called a factor of n, is an integer m that may be multiplied by some integer to produce n. In this case, one also says that n is a multiple of m. An integer n is divisible or evenly divisible by ...

by 2 horizontally and vertically). Y′CbCr color is often informally called

YUV YUV is a color model typically used as part of a color image pipeline. It encodes a color image or video taking human perception into account, allowing reduced bandwidth for chrominance components, compared to a "direct" RGB-representation. H ...

to simplify the notation, although that term more properly applies to a somewhat different color format. Similarly, the terms

luminance Luminance is a photometric measure of the luminous intensity per unit area of light travelling in a given direction. It describes the amount of light that passes through, is emitted from, or is reflected from a particular area, and falls with ...

and

chrominance Chrominance (''chroma'' or ''C'' for short) is the signal used in video systems to convey the color information of the picture (see YUV color model), separately from the accompanying luma signal (or Y' for short). Chrominance is usually represen ...

are often used instead of the (more accurate) terms luma and chroma.

Resolution/bitrate

MPEG-1 supports resolutions up to 4095×4095 (12 bits), and bit rates up to 100 Mbit/s. MPEG-1 videos are most commonly seen using Source Input Format (SIF) resolution: 352×240, 352×288, or 320×240. These relatively low resolutions, combined with a bitrate less than 1.5 Mbit/s, make up what is known as a constrained parameters bitstream (CPB), later renamed the "Low Level" (LL) profile in MPEG-2. This is the minimum video specifications any decoder should be able to handle, to be considered MPEG-1 compliant. This was selected to provide a good balance between quality and performance, allowing the use of reasonably inexpensive hardware of the time.

Frame/picture/block types

MPEG-1 has several frame/picture types that serve different purposes. The most important, yet simplest, is I-frame.

I-frames

"I-frame" is an abbreviation for "

Intra-frame Intra-frame coding is a data compression technique used within a video frame, enabling smaller file sizes and lower bitrates, with little or no loss in quality. Since neighboring pixels within an image are often very similar, rather than storing ...

", so-called because they can be decoded independently of any other frames. They may also be known as I-pictures, or keyframes due to their somewhat similar function to the

key frame In animation and filmmaking, a key frame (or keyframe) is a drawing or shot that defines the starting and ending points of a smooth transition. These are called ''frames'' because their position in time is measured in frames on a strip of fi ...

s used in animation. I-frames can be considered effectively identical to baseline

images. High-speed seeking through an MPEG-1 video is only possible to the nearest I-frame. When cutting a video it is not possible to start playback of a segment of video before the first I-frame in the segment (at least not without computationally intensive re-encoding). For this reason, I-frame-only MPEG videos are used in editing applications. I-frame only compression is very fast, but produces very large file sizes: a factor of 3× (or more) larger than normally encoded MPEG-1 video, depending on how temporally complex a specific video is. I-frame only MPEG-1 video is very similar to

MJPEG Motion JPEG (M-JPEG or MJPEG) is a video compression format in which each video frame or interlaced field of a digital video sequence is compressed separately as a JPEG image. Originally developed for multimedia PC applications, Motion JPEG ...

video. So much so that very high-speed and theoretically lossless (in reality, there are rounding errors) conversion can be made from one format to the other, provided a couple of restrictions (color space and quantization matrix) are followed in the creation of the bitstream. – (Requires clever reading: says quantization matrices differ, but those are just defaults, and selectable) The length between I-frames is known as the

group of pictures In video coding, a group of pictures, or GOP structure, specifies the order in which intra- and inter-frames are arranged. The GOP is a collection of successive pictures within a coded video stream. Each coded video stream consists of successive ...

(GOP) size. MPEG-1 most commonly uses a GOP size of 15–18. i.e. 1 I-frame for every 14-17 non-I-frames (some combination of P- and B- frames). With more intelligent encoders, GOP size is dynamically chosen, up to some pre-selected maximum limit. Limits are placed on the maximum number of frames between I-frames due to decoding complexing, decoder buffer size, recovery time after data errors, seeking ability, and accumulation of IDCT errors in low-precision implementations most common in hardware decoders (See:

IEEE The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operati ...

-1180).

P-frames

"P-frame" is an abbreviation for "Predicted-frame". They may also be called forward-predicted frames or inter-frames (B-frames are also inter-frames). P-frames exist to improve compression by exploiting the temporal (over time) redundancy in a video. P-frames store only the ''difference'' in image from the frame (either an I-frame or P-frame) immediately preceding it (this reference frame is also called the ''

anchor An anchor is a device, normally made of metal , used to secure a vessel to the bed of a body of water to prevent the craft from drifting due to wind or current. The word derives from Latin ''ancora'', which itself comes from the Greek � ...

frame''). The difference between a P-frame and its anchor frame is calculated using ''motion vectors'' on each ''macroblock'' of the frame (see below). Such motion vector data will be embedded in the P-frame for use by the decoder. A P-frame can contain any number of intra-coded blocks, in addition to any forward-predicted blocks. If a video drastically changes from one frame to the next (such as a cut), it is more efficient to encode it as an I-frame.

B-frames

"B-frame" stands for "bidirectional-frame" or "bipredictive frame". They may also be known as backwards-predicted frames or B-pictures. B-frames are quite similar to P-frames, except they can make predictions using both the previous and future frames (i.e. two anchor frames). It is therefore necessary for the player to first decode the next I- or P- anchor frame sequentially after the B-frame, before the B-frame can be decoded and displayed. This means decoding B-frames requires larger

data buffer In computer science, a data buffer (or just buffer) is a region of a memory used to temporarily store data while it is being moved from one place to another. Typically, the data is stored in a buffer as it is retrieved from an input device (such a ...

s and causes an increased delay on both decoding and during encoding. This also necessitates the decoding time stamps (DTS) feature in the container/system stream (see above). As such, B-frames have long been subject of much controversy, they are often avoided in videos, and are sometimes not fully supported by hardware decoders. No other frames are predicted from a B-frame. Because of this, a very low bitrate B-frame can be inserted, where needed, to help control the bitrate. If this was done with a P-frame, future P-frames would be predicted from it and would lower the quality of the entire sequence. However, similarly, the future P-frame must still encode all the changes between it and the previous I- or P- anchor frame. B-frames can also be beneficial in videos where the background behind an object is being revealed over several frames, or in fading transitions, such as scene changes. A B-frame can contain any number of intra-coded blocks and forward-predicted blocks, in addition to backwards-predicted, or bidirectionally predicted blocks.

D-frames

MPEG-1 has a unique frame type not found in later video standards. "D-frames" or DC-pictures are independently coded images (intra-frames) that have been encoded using DC transform coefficients only (AC coefficients are removed when encoding D-frames—see DCT below) and hence are very low quality. D-frames are never referenced by I-, P- or B- frames. D-frames are only used for fast previews of video, for instance when seeking through a video at high speed. Given moderately higher-performance decoding equipment, fast preview can be accomplished by decoding I-frames instead of D-frames. This provides higher quality previews, since I-frames contain AC coefficients as well as DC coefficients. If the encoder can assume that rapid I-frame decoding capability is available in decoders, it can save bits by not sending D-frames (thus improving compression of the video content). For this reason, D-frames are seldom actually used in MPEG-1 video encoding, and the D-frame feature has not been included in any later video coding standards.

Macroblocks

MPEG-1 operates on video in a series of 8×8 blocks for quantization. However, to reduce the bit rate needed for motion vectors and because chroma (color) is subsampled by a factor of 4, each pair of (red and blue) chroma blocks corresponds to 4 different luma blocks. This set of 6 blocks, with a resolution of 16×16, is processed together and called a ''macroblock''. A macroblock is the smallest independent unit of (color) video. Motion vectors (see below) operate solely at the macroblock level. If the height or width of the video are not exact multiples of 16, full rows and full columns of macroblocks must still be encoded and decoded to fill out the picture (though the extra decoded pixels are not displayed).

Motion vectors

To decrease the amount of temporal redundancy in a video, only blocks that change are updated, (up to the maximum GOP size). This is known as conditional replenishment. However, this is not very effective by itself. Movement of the objects, and/or the camera may result in large portions of the frame needing to be updated, even though only the position of the previously encoded objects has changed. Through motion estimation, the encoder can compensate for this movement and remove a large amount of redundant information. The encoder compares the current frame with adjacent parts of the video from the anchor frame (previous I- or P- frame) in a diamond pattern, up to a (encoder-specific) predefined

radius In classical geometry, a radius (plural, : radii) of a circle or sphere is any of the line segments from its Centre (geometry), center to its perimeter, and in more modern usage, it is also their length. The name comes from the latin ''radius'', ...

limit from the area of the current macroblock. If a match is found, only the direction and distance (i.e. the ''vector'' of the ''motion'') from the previous video area to the current macroblock need to be encoded into the inter-frame (P- or B- frame). The reverse of this process, performed by the decoder to reconstruct the picture, is called

. A predicted macroblock rarely matches the current picture perfectly, however. The differences between the estimated matching area, and the real frame/macroblock is called the prediction error. The larger the amount of prediction error, the more data must be additionally encoded in the frame. For efficient video compression, it is very important that the encoder is capable of effectively and precisely performing motion estimation. Motion vectors record the ''distance'' between two areas on screen based on the number of pixels (also called pels). MPEG-1 video uses a motion vector (MV) precision of one half of one pixel, or half-pel. The finer the precision of the MVs, the more accurate the match is likely to be, and the more efficient the compression. There are trade-offs to higher precision, however. Finer MV precision results in using a larger amount of data to represent the MV, as larger numbers must be stored in the frame for every single MV, increased coding complexity as increasing levels of interpolation on the macroblock are required for both the encoder and decoder, and

diminishing returns In economics, diminishing returns are the decrease in marginal (incremental) output of a production process as the amount of a single factor of production is incrementally increased, holding all other factors of production equal ( ceteris pari ...

(minimal gains) with higher precision MVs. Half-pel precision was chosen as the ideal trade-off for that point in time. (See: qpel) Because neighboring macroblocks are likely to have very similar motion vectors, this redundant information can be compressed quite effectively by being stored

DPCM Differential pulse-code modulation (DPCM) is a signal encoder that uses the baseline of pulse-code modulation (PCM) but adds some functionalities based on the prediction of the samples of the signal. The input can be an analog signal or a digital ...

-encoded. Only the (smaller) amount of difference between the MVs for each macroblock needs to be stored in the final bitstream. P-frames have one motion vector per macroblock, relative to the previous anchor frame. B-frames, however, can use two motion vectors; one from the previous anchor frame, and one from the future anchor frame. Partial macroblocks, and black borders/bars encoded into the video that do not fall exactly on a macroblock boundary, cause havoc with motion prediction. The block padding/border information prevents the macroblock from closely matching with any other area of the video, and so, significantly larger prediction error information must be encoded for every one of the several dozen partial macroblocks along the screen border. DCT encoding and quantization (see below) also isn't nearly as effective when there is large/sharp picture contrast in a block. An even more serious problem exists with macroblocks that contain significant, random, ''edge noise'', where the picture transitions to (typically) black. All the above problems also apply to edge noise. In addition, the added randomness is simply impossible to compress significantly. All of these effects will lower the quality (or increase the bitrate) of the video substantially.

DCT

Each 8×8 block is encoded by first applying a ''forward'' discrete cosine transform (FDCT) and then a quantization process. The FDCT process (by itself) is theoretically lossless, and can be reversed by applying an ''Inverse'' DCT ( IDCT) to reproduce the original values (in the absence of any quantization and rounding errors). In reality, there are some (sometimes large) rounding errors introduced both by quantization in the encoder (as described in the next section) and by IDCT approximation error in the decoder. The minimum allowed accuracy of a decoder IDCT approximation is defined by ISO/IEC 23002-1. (Prior to 2006, it was specified by IEEE 1180-1990.) The FDCT process converts the 8×8 block of uncompressed pixel values (brightness or color difference values) into an 8×8 indexed array of ''frequency coefficient'' values. One of these is the (statistically high in variance) "DC coefficient", which represents the average value of the entire 8×8 block. The other 63 coefficients are the statistically smaller "AC coefficients", which have positive or negative values each representing sinusoidal deviations from the flat block value represented by the DC coefficient. An example of an encoded 8×8 FDCT block: :

\begin
 -415 & -30 & -61 & 27 & 56 & -20 & -2 & 0 \\
 4 & -22 & -61 & 10 & 13 & -7 & -9 & 5 \\
 -47 & 7 & 77 & -25 & -29 & 10 & 5 & -6 \\
 -49 & 12 & 34 & -15 & -10 & 6 & 2 & 2 \\
 12 & -7 & -13 & -4 & -2 & 2 & -3 & 3 \\
 -8 & 3 & 2 & -6 & -2 & 1 & 4 & 2 \\
 -1 & 0 & 0 & -2 & -1 & -3 & 4 & -1 \\
 0 & 0 & -1 & -4 & -1 & 0 & 1 & 2
\end

Since the DC coefficient value is statistically correlated from one block to the next, it is compressed using

encoding. Only the (smaller) amount of difference between each DC value and the value of the DC coefficient in the block to its left needs to be represented in the final bitstream. Additionally, the frequency conversion performed by applying the DCT provides a statistical decorrelation function to efficiently concentrate the signal into fewer high-amplitude values prior to applying quantization (see below).

Quantization

Quantization is, essentially, the process of reducing the accuracy of a signal, by dividing it by some larger step size and rounding to an integer value (i.e. finding the nearest multiple, and discarding the remainder). The frame-level quantizer is a number from 0 to 31 (although encoders will usually omit/disable some of the extreme values) which determines how much information will be removed from a given frame. The frame-level quantizer is typically either dynamically selected by the encoder to maintain a certain user-specified bitrate, or (much less commonly) directly specified by the user. A "quantization matrix" is a string of 64 numbers (ranging from 0 to 255) which tells the encoder how relatively important or unimportant each piece of visual information is. Each number in the matrix corresponds to a certain frequency component of the video image. An example quantization matrix: :

\begin
 16 & 11 & 10 & 16 & 24 & 40 & 51 & 61 \\
 12 & 12 & 14 & 19 & 26 & 58 & 60 & 55 \\
 14 & 13 & 16 & 24 & 40 & 57 & 69 & 56 \\
 14 & 17 & 22 & 29 & 51 & 87 & 80 & 62 \\
 18 & 22 & 37 & 56 & 68 & 109 & 103 & 77 \\
 24 & 35 & 55 & 64 & 81 & 104 & 113 & 92 \\
 49 & 64 & 78 & 87 & 103 & 121 & 120 & 101 \\
 72 & 92 & 95 & 98 & 112 & 100 & 103 & 99
\end

Quantization is performed by taking each of the 64 ''frequency'' values of the DCT block, dividing them by the frame-level quantizer, then dividing them by their corresponding values in the quantization matrix. Finally, the result is rounded down. This significantly reduces, or completely eliminates, the information in some frequency components of the picture. Typically, high frequency information is less visually important, and so high frequencies are much more ''strongly quantized'' (drastically reduced). MPEG-1 actually uses two separate quantization matrices, one for intra-blocks (I-blocks) and one for inter-block (P- and B- blocks) so quantization of different block types can be done independently, and so, more effectively. This quantization process usually reduces a significant number of the ''AC coefficients'' to zero, (known as sparse data) which can then be more efficiently compressed by entropy coding (lossless compression) in the next step. An example quantized DCT block: :

\begin
 -26 & -3 & -6 & 2 & 2 & -1 & 0 & 0 \\
 0 & -2 & -4 & 1 & 1 & 0 & 0 & 0 \\
 -3 & 1 & 5 & -1 & -1 & 0 & 0 & 0 \\
 -4 & 1 & 2 & -1 & 0 & 0 & 0 & 0 \\
 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0
\end

Quantization eliminates a large amount of data, and is the main lossy processing step in MPEG-1 video encoding. This is also the primary source of most MPEG-1 video

compression artifacts A compression artifact (or artefact) is a noticeable distortion of media (including images, audio, and video) caused by the application of lossy compression. Lossy data compression involves discarding some of the media's data so that it beco ...

, like

blockiness In polymer chemistry, a copolymer is a polymer derived from more than one species of monomer. The polymerization of monomers into copolymers is called copolymerization. Copolymers obtained from the copolymerization of two monomer species are some ...

, color banding,

noise Noise is unwanted sound considered unpleasant, loud or disruptive to hearing. From a physics standpoint, there is no distinction between noise and desired sound, as both are vibrations through a medium, such as air or water. The difference aris ...

ringing Ringing may mean: Vibrations * Ringing (signal), unwanted oscillation of a signal, leading to ringing artifacts * Vibration of a harmonic oscillator ** Bell ringing * Ringing (telephony), the sound of a telephone bell * Ringing (medicine), a ring ...

, discoloration, et al. This happens when video is encoded with an insufficient bitrate, and the encoder is therefore forced to use high frame-level quantizers (''strong quantization'') through much of the video.

Entropy coding

Several steps in the encoding of MPEG-1 video are lossless, meaning they will be reversed upon decoding, to produce exactly the same (original) values. Since these lossless data compression steps don't add noise into, or otherwise change the contents (unlike quantization), it is sometimes referred to as noiseless coding. Since lossless compression aims to remove as much redundancy as possible, it is known as entropy coding in the field of

information theory Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...

. The coefficients of quantized DCT blocks tend to zero towards the bottom-right. Maximum compression can be achieved by a zig-zag scanning of the DCT block starting from the top left and using Run-length encoding techniques. The DC coefficients and motion vectors are

-encoded.

Run-length encoding Run-length encoding (RLE) is a form of lossless data compression in which ''runs'' of data (sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original ...

(RLE) is a simple method of compressing repetition. A sequential string of characters, no matter how long, can be replaced with a few bytes, noting the value that repeats, and how many times. For example, if someone were to say "five nines", you would know they mean the number: 99999. RLE is particularly effective after quantization, as a significant number of the AC coefficients are now zero (called sparse data), and can be represented with just a couple of bytes. This is stored in a special 2-

dimensional In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coordi ...

Huffman table that codes the run-length and the run-ending character.

Huffman Coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algo ...

is a very popular and relatively simple method of entropy coding, and used in MPEG-1 video to reduce the data size. The data is analyzed to find strings that repeat often. Those strings are then put into a special table, with the most frequently repeating data assigned the shortest code. This keeps the data as small as possible with this form of compression. Once the table is constructed, those strings in the data are replaced with their (much smaller) codes, which reference the appropriate entry in the table. The decoder simply reverses this process to produce the original data. This is the final step in the video encoding process, so the result of

Huffman coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algo ...

is known as the MPEG-1 video "bitstream."

GOP configurations for specific applications

I-frames store complete frame info within the frame and are therefore suited for random access. P-frames provide compression using motion vectors relative to the previous frame ( I or P ). B-frames provide maximum compression but require the previous as well as next frame for computation. Therefore, processing of B-frames requires more buffer on the decoded side. A configuration of the

Group of Pictures In video coding, a group of pictures, or GOP structure, specifies the order in which intra- and inter-frames are arranged. The GOP is a collection of successive pictures within a coded video stream. Each coded video stream consists of successive ...

(GOP) should be selected based on these factors. I-frame only sequences give least compression, but are useful for random access, FF/FR and editability. I- and P-frame sequences give moderate compression but add a certain degree of random access, FF/FR functionality. I-, P- and B-frame sequences give very high compression but also increase the coding/decoding delay significantly. Such configurations are therefore not suited for video-telephony or video-conferencing applications. The typical data rate of an I-frame is 1 bit per pixel while that of a P-frame is 0.1 bit per pixel and for a B-frame, 0.015 bit per pixel.

Part 3: Audio

Part 3 of the MPEG-1 standard covers audio and is defined in ISO/IEC-11172-3. MPEG-1 Audio utilizes psychoacoustics to significantly reduce the data rate required by an audio stream. It reduces or completely discards certain parts of the audio that it deduces that the human ear can't ''hear'', either because they are in frequencies where the ear has limited sensitivity, or are '' masked'' by other (typically louder) sounds. Channel Encoding: *Mono *Joint Stereo – intensity encoded *Joint Stereo – M/S encoded for Layer III only *Stereo *Dual (two

uncorrelated In probability theory and statistics, two real-valued random variables, X, Y, are said to be uncorrelated if their covariance, \operatorname ,Y= \operatorname Y- \operatorname \operatorname /math>, is zero. If two variables are uncorrelated, ther ...

mono channels) *

Sampling rate In signal processing, sampling is the reduction of a continuous-time signal In mathematical dynamics, discrete time and continuous time are two alternative frameworks within which variables that evolve over time are modeled. Discrete time ...

s: 32000, 44100, and 48000 Hz *

Bitrate In telecommunications and computing, bit rate (bitrate or as a variable ''R'') is the number of bits that are conveyed or processed per unit of time. The bit rate is expressed in the unit bit per second (symbol: bit/s), often in conjunction ...

s for Layer I: 32, 64, 96, 128, 160, 192, 224, 256, 288, 320, 352, 384, 416 and 448 kbit/s *

s for Layer II: 32, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256, 320 and 384 kbit/s *

s for Layer III: 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256 and 320 kbit/s MPEG-1 Audio is divided into 3 layers. Each higher layer is more computationally complex, and generally more efficient at lower bitrates than the previous. The layers are semi backwards compatible as higher layers reuse technologies implemented by the lower layers. A "Full" Layer II decoder can also play Layer I audio, but ''not'' Layer III audio, although not all higher level players are "full".

Layer I

MPEG-1 Audio Layer I is a simplified version of MPEG-1 Audio Layer II. Layer I uses a smaller 384-sample frame size for very low delay, and finer resolution. This is advantageous for applications like teleconferencing, studio editing, etc. It has lower complexity than Layer II to facilitate

real-time Real-time or real time describes various operations in computing or other processes that must guarantee response times within a specified time (deadline), usually a relatively short time. A real-time process is generally one that happens in defined ...

encoding on the hardware available . Layer I saw limited adoption in its time, and most notably was used on

' defunct

at a bitrate of 384 kbit/s. With the substantial performance improvements in digital processing since its introduction, Layer I quickly became unnecessary and obsolete. Layer I audio files typically use the extension ".mp1" or sometimes ".m1a".

Layer II

MPEG-1 Audio Layer II (the first version of MP2, often informally called MUSICAM) is a

audio format designed to provide high quality at about 192 kbit/s for stereo sound. Decoding MP2 audio is computationally simple relative to MP3, AAC, etc.

History/MUSICAM

MPEG-1 Audio Layer II was derived from the MUSICAM (''Masking pattern adapted Universal Subband Integrated Coding And Multiplexing'') audio codec, developed by Centre commun d'études de télévision et télécommunications (CCETT),

, and Institut für Rundfunktechnik (IRT/CNET) as part of the EUREKA 147 pan-European inter-governmental research and development initiative for the development of digital audio broadcasting. Most key features of MPEG-1 Audio were directly inherited from MUSICAM, including the filter bank, time-domain processing, audio frame sizes, etc. However, improvements were made, and the actual MUSICAM algorithm was not used in the final MPEG-1 Audio Layer II standard. The widespread usage of the term MUSICAM to refer to Layer II is entirely incorrect and discouraged for both technical and legal reasons.

Technical details

MP2 is a time-domain encoder. It uses a low-delay 32 sub-band polyphased

filter bank In signal processing, a filter bank (or filterbank) is an array of bandpass filters that separates the input signal into multiple components, each one carrying a single frequency sub-band of the original signal. One application of a filter bank is ...

for time-frequency mapping; having overlapping ranges (i.e. polyphased) to prevent aliasing. The psychoacoustic model is based on the principles of auditory masking, simultaneous masking effects, and the

absolute threshold of hearing The absolute threshold of hearing (ATH) is the minimum sound level of a pure tone that an average human ear with normal hearing can hear with no other sound present. The absolute threshold relates to the sound that can just be heard by the organi ...

(ATH). The size of a Layer II frame is fixed at 1152-samples (coefficients).

Time domain Time domain refers to the analysis of mathematical functions, physical signals or time series of economic or environmental data, with respect to time. In the time domain, the signal or function's value is known for all real numbers, for the c ...

refers to how analysis and quantization is performed on short, discrete samples/chunks of the audio waveform. This offers low delay as only a small number of samples are analyzed before encoding, as opposed to

frequency domain In physics, electronics, control systems engineering, and statistics, the frequency domain refers to the analysis of mathematical functions or signals with respect to frequency, rather than time. Put simply, a time-domain graph shows how a s ...

encoding (like MP3) which must analyze many times more samples before it can decide how to transform and output encoded audio. This also offers higher performance on complex, random and

transient ECHELON, originally a secret government code name, is a surveillance program (signals intelligence/SIGINT collection and analysis network) operated by the five signatory states to the UKUSA Security Agreement:Given the 5 dialects that us ...

impulses (such as percussive instruments, and applause), offering avoidance of artifacts like pre-echo. The 32 sub-band filter bank returns 32

amplitude The amplitude of a periodic variable is a measure of its change in a single period (such as time or spatial period). The amplitude of a non-periodic signal is its magnitude compared with a reference value. There are various definitions of am ...

coefficients In mathematics, a coefficient is a multiplicative factor in some term of a polynomial, a series, or an expression; it is usually a number, but may be any expression (including variables such as , and ). When the coefficients are themselves ...

, one for each equal-sized frequency band/segment of the audio, which is about 700 Hz wide (depending on the audio's sampling frequency). The encoder then utilizes the psychoacoustic model to determine which sub-bands contain audio information that is less important, and so, where quantization will be inaudible, or at least much less noticeable. Fft-2

The psychoacoustic model is applied using a 1024-point

fast Fourier transform A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). Fourier analysis converts a signal from its original domain (often time or space) to a representation in ...

(FFT). Of the 1152 samples per frame, 64 samples at the top and bottom of the frequency range are ignored for this analysis. They are presumably not significant enough to change the result. The psychoacoustic model uses an empirically determined masking model to determine which sub-bands contribute more to the masking threshold, and how much quantization noise each can contain without being perceived. Any sounds below the

(ATH) are completely discarded. The available bits are then assigned to each sub-band accordingly. Typically, sub-bands are less important if they contain quieter sounds (smaller coefficient) than a neighboring (i.e. similar frequency) sub-band with louder sounds (larger coefficient). Also, "noise" components typically have a more significant masking effect than "tonal" components. Less significant sub-bands are reduced in accuracy by quantization. This basically involves compressing the frequency range (amplitude of the coefficient), i.e. raising the noise floor. Then computing an amplification factor, for the decoder to use to re-expand each sub-band to the proper frequency range. Layer II can also optionally use intensity stereo coding, a form of joint stereo. This means that the frequencies above 6 kHz of both channels are combined/down-mixed into one single (mono) channel, but the "side channel" information on the relative intensity (volume, amplitude) of each channel is preserved and encoded into the bitstream separately. On playback, the single channel is played through left and right speakers, with the intensity information applied to each channel to give the illusion of stereo sound. This perceptual trick is known as "stereo irrelevancy". This can allow further reduction of the audio bitrate without much perceivable loss of fidelity, but is generally not used with higher bitrates as it does not provide very high quality (transparent) audio.

Quality

Subjective audio testing by experts, in the most critical conditions ever implemented, has shown MP2 to offer transparent audio compression at 256 kbit/s for 16-bit 44.1 kHz

CD audio Compact Disc Digital Audio (CDDA or CD-DA), also known as Digital Audio Compact Disc or simply as Audio CD, is the standard format for audio compact discs. The standard is defined in the ''Red Book'', one of a series of Rainbow Books (named f ...

using the earliest reference implementation (more recent encoders should presumably perform even better). That (approximately) 1:6 compression ratio for CD audio is particularly impressive because it is quite close to the estimated upper limit of perceptual

entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...

, at just over 1:8. Achieving much higher compression is simply not possible without discarding some perceptible information. MP2 remains a favoured lossy audio coding standard due to its particularly high audio coding performances on important audio material such as castanet, symphonic orchestra, male and female voices and particularly complex and high energy transients (impulses) like percussive sounds: triangle, glockenspiel and audience applause. More recent testing has shown that MPEG Multichannel (based on MP2), despite being compromised by an inferior matrixed mode (for the sake of backwards compatibility) rates just slightly lower than much more recent audio codecs, such as

(AC-3) and

Advanced Audio Coding Advanced Audio Coding (AAC) is an audio coding standard for lossy digital audio compression. Designed to be the successor of the MP3 format, AAC generally achieves higher sound quality than MP3 encoders at the same bit rate. AAC has been stan ...

(AAC) (mostly within the margin of error—and substantially superior in some cases, such as audience applause).Wustenhagen et al., ''Subjective Listening Test of Multi-channel Audio Codecs'', AES 105th Convention Paper 4813, San Francisco 1998 This is one reason that MP2 audio continues to be used extensively. The MPEG-2 AAC Stereo verification tests reached a vastly different conclusion, however, showing AAC to provide superior performance to MP2 at half the bitrate. The reason for this disparity with both earlier and later tests is not clear, but strangely, a sample of applause is notably absent from the latter test. Layer II audio files typically use the extension ".mp2" or sometimes ".m2a".

Layer III

MPEG-1 Audio Layer III (the first version of

) is a

audio format designed to provide acceptable quality at about 64 kbit/s for monaural audio over single-channel ( BRI)

ISDN Integrated Services Digital Network (ISDN) is a set of communication standards for simultaneous digital transmission of voice, video, data, and other network services over the digitalised circuits of the public switched telephone network. Work ...

links, and 128 kbit/s for stereo sound.

History/ASPEC

MPEG-1 Audio Layer III was derived from the ''Adaptive Spectral Perceptual Entropy Coding'' (ASPEC) codec developed by Fraunhofer as part of the EUREKA 147 pan-European inter-governmental research and development initiative for the development of digital audio broadcasting. ASPEC was adapted to fit in with the Layer II model (frame size, filter bank, FFT, etc.), to become Layer III. ASPEC was itself based on ''Multiple adaptive Spectral audio Coding'' (MSC) by E. F. Schroeder, ''Optimum Coding in the Frequency domain'' (OCF) the

doctoral thesis A thesis ( : theses), or dissertation (abbreviated diss.), is a document submitted in support of candidature for an academic degree or professional qualification presenting the author's research and findings.International Standard ISO 7144: ...

Karlheinz Brandenburg Karlheinz Brandenburg (born 20 June 1954) is a German electrical engineer and mathematician. Together with Ernst Eberlein, Heinz Gerhäuser (former Institutes Director of Fraunhofer IIS), Bernhard Grill, Jürgen Herre and Harald Popp (all Fraunh ...

at the

University of Erlangen-Nuremberg A university () is an institution of higher (or tertiary) education and research which awards academic degrees in several academic disciplines. Universities typically offer both undergraduate and postgraduate programs. In the United States, the ...

, ''Perceptual Transform Coding'' (PXFM) by J. D. Johnston at

AT&T AT&T Inc. is an American multinational telecommunications holding company headquartered at Whitacre Tower in Downtown Dallas, Texas. It is the world's largest telecommunications company by revenue and the third largest provider of mobile ...

Bell Labs Nokia Bell Labs, originally named Bell Telephone Laboratories (1925–1984), then AT&T Bell Laboratories (1984–1996) and Bell Labs Innovations (1996–2007), is an American industrial research and scientific development company owned by mul ...

, and ''Transform coding of audio signals'' by Y. Mahieux and J. Petit at Institut für Rundfunktechnik (IRT/CNET).

Technical details

MP3 is a frequency-domain audio transform encoder. Even though it utilizes some of the lower layer functions, MP3 is quite different from MP2. MP3 works on 1152 samples like MP2, but needs to take multiple frames for analysis before frequency-domain (MDCT) processing and quantization can be effective. It outputs a variable number of samples, using a bit buffer to enable this variable bitrate (VBR) encoding while maintaining 1152 sample size output frames. This causes a significantly longer delay before output, which has caused MP3 to be considered unsuitable for studio applications where editing or other processing needs to take place. MP3 does not benefit from the 32 sub-band polyphased filter bank, instead just using an 18-point MDCT transformation on each output to split the data into 576 frequency components, and processing it in the frequency domain. This extra

granularity Granularity (also called graininess), the condition of existing in granules or grains, refers to the extent to which a material or system is composed of distinguishable pieces. It can either refer to the extent to which a larger entity is sub ...

allows MP3 to have a much finer psychoacoustic model, and more carefully apply appropriate quantization to each band, providing much better low-bitrate performance. Frequency-domain processing imposes some limitations as well, causing a factor of 12 or 36 × worse temporal resolution than Layer II. This causes quantization artifacts, due to transient sounds like percussive events and other high-frequency events that spread over a larger window. This results in audible smearing and pre-echo. MP3 uses pre-echo detection routines, and VBR encoding, which allows it to temporarily increase the bitrate during difficult passages, in an attempt to reduce this effect. It is also able to switch between the normal 36 sample quantization window, and instead using 3× short 12 sample windows instead, to reduce the temporal (time) length of quantization artifacts. And yet in choosing a fairly small window size to make MP3's temporal response adequate enough to avoid the most serious artifacts, MP3 becomes much less efficient in frequency domain compression of stationary, tonal components. Being forced to use a ''hybrid'' time domain (filter bank) /frequency domain (MDCT) model to fit in with Layer II simply wastes processing time and compromises quality by introducing aliasing artifacts. MP3 has an aliasing cancellation stage specifically to mask this problem, but which instead produces frequency domain energy which must be encoded in the audio. This is pushed to the top of the frequency range, where most people have limited hearing, in hopes the distortion it causes will be less audible. Layer II's 1024 point FFT doesn't entirely cover all samples, and would omit several entire MP3 sub-bands, where quantization factors must be determined. MP3 instead uses two passes of FFT analysis for spectral estimation, to calculate the global and individual masking thresholds. This allows it to cover all 1152 samples. Of the two, it utilizes the global masking threshold level from the more critical pass, with the most difficult audio. In addition to Layer II's intensity encoded joint stereo, MP3 can use middle/side (mid/side, m/s, MS, matrixed) joint stereo. With mid/side stereo, certain frequency ranges of both channels are merged into a single (middle, mid, L+R) mono channel, while the sound difference between the left and right channels is stored as a separate (side, L-R) channel. Unlike intensity stereo, this process does not discard any audio information. When combined with quantization, however, it can exaggerate artifacts. If the difference between the left and right channels is small, the side channel will be small, which will offer as much as a 50% bitrate savings, and associated quality improvement. If the difference between left and right is large, standard (discrete, left/right) stereo encoding may be preferred, as mid/side joint stereo will not provide any benefits. An MP3 encoder can switch between m/s stereo and full stereo on a frame-by-frame basis. Unlike Layers I and II, MP3 uses variable-length

(after perceptual) to further reduce the bitrate, without any further quality loss.

Quality

MP3's more fine-grained and selective quantization does prove notably superior to MP2 at lower-bitrates. It is able to provide nearly equivalent audio quality to Layer II, at a 15% lower bitrate (approximately). 128 kbit/s is considered the "sweet spot" for MP3; meaning it provides generally acceptable quality stereo sound on most music, and there are diminishing quality improvements from increasing the bitrate further. MP3 is also regarded as exhibiting artifacts that are less annoying than Layer II, when both are used at bitrates that are too low to possibly provide faithful reproduction. Layer III audio files use the extension ".mp3".

MPEG-2 audio extensions

The

standard includes several extensions to MPEG-1 Audio. These are known as MPEG-2 BC – backwards compatible with MPEG-1 Audio. MPEG-2 Audio is defined in ISO/IEC 13818-3. * MPEG Multichannel – Backward compatible 5.1-channel

surround sound Surround sound is a technique for enriching the fidelity and depth of sound reproduction by using multiple audio channels from speakers that surround the listener (surround channels). Its first application was in movie theaters. Prior to s ...

. *

s: 16000, 22050, and 24000 Hz *

s: 8, 16, 24, 32, 40, 48, 56, 64, 80, 96, 112, 128, 144 and 160 kbit/s These sampling rates are exactly half that of those originally defined for MPEG-1 Audio. They were introduced to maintain higher quality sound when encoding audio at lower-bitrates. The even-lower bitrates were introduced because tests showed that MPEG-1 Audio could provide higher quality than any existing () very low bitrate (i.e.

speech Speech is a human vocal communication using language. Each language uses phonetic combinations of vowel and consonant sounds that form the sound of its words (that is, all English words sound different from all French words, even if they are th ...

) audio codecs.

Part 4: Conformance testing

Part 4 of the MPEG-1 standard covers conformance testing, and is defined in ISO/IEC-11172-4. Conformance: Procedures for testing conformance. Provides two sets of guidelines and reference bitstreams for testing the conformance of MPEG-1 audio and video decoders, as well as the bitstreams produced by an encoder.

Part 5: Reference software

Part 5 of the MPEG-1 standard includes reference software, and is defined in ISO/IEC TR 11172–5. Simulation: Reference software. C reference code for encoding and decoding of audio and video, as well as multiplexing and demultiplexing. This includes the ''ISO Dist10'' audio encoder code, which

LAME Lame or LAME may refer to: Music * "Lame" (song) by Unwritten Law * ''Lame'' (album) by Iame People * Ibrahim Lame (born 1953), Nigerian educator and politician * Jennifer Lame (), American film editor * Quintín Lame (1880–1967), Colombian ...

and TooLAME were originally based upon.

File extension

.mpg is one of a number of file extensions for MPEG-1 or

audio and video compression. MPEG-1 Part 2 video is rare nowadays, and this extension typically refers to an

(defined in MPEG-1 and MPEG-2) or

(defined in MPEG-2). Other suffixes such as .m2ts also exist specifying the precise container, in this case MPEG-2 TS, but this has little relevance to MPEG-1 media. .mp3 is the most common extension for files containing

audio (typically MPEG-1 Audio, sometimes MPEG-2 Audio). An MP3 file is typically an uncontained stream of raw audio; the conventional way to tag MP3 files is by writing data to "garbage" segments of each frame, which preserve the media information but are discarded by the player. This is similar in many respects to how raw .AAC files are tagged (but this is less supported nowadays, e.g.

iTunes iTunes () is a software program that acts as a media player, media library, mobile device management utility, and the client app for the iTunes Store. Developed by Apple Inc., it is used to purchase, play, download, and organize digital mu ...

). Note that although it would apply, .mpg does not normally append raw AAC or AAC in MPEG-2 Part 7 Containers. The .aac extension normally denotes these audio files.

References

External links

Official Web Page of the Moving Picture Experts Group (MPEG) a working group of ISO/IECMPEG Industry Forum OrganizationSource Code to Implement MPEG-1
{{DEFAULTSORT:Mpeg-1 Audio codecs Video codecs MPEG ISO/IEC standards Computer-related introductions in 1993

History

Patents

Former patent holders

Applications

Part 1: Systems

Elementary streams, packets, and clock references

Program streams

Multiplexing

Part 2: Video

Color space

Resolution/bitrate

Frame/picture/block types

I-frames

P-frames

B-frames

D-frames

Macroblocks

Motion vectors

DCT

Quantization

Entropy coding

GOP configurations for specific applications

Part 3: Audio

Layer I

Layer II

History/MUSICAM

Technical details

Quality

Layer III

History/ASPEC

Technical details

Quality

MPEG-2 audio extensions

Part 4: Conformance testing

Part 5: Reference software

File extension

See also

References

External links