H.262
or MPEG-2 Part 2 (formally known as ITU-T Recommendation H.262 and ISO/IEC 13818-2,
also known as MPEG-2 Video) is a
video coding format
A video coding format (or sometimes video compression format) is a content representation format for storage or transmission of digital video content (such as in a data file or bitstream). It typically uses a standardized video compression alg ...
standardised and jointly maintained by
ITU-T Study Group 16 Video Coding Experts Group
The Video Coding Experts Group or Visual Coding Experts Group (VCEG, also known as Question 6) is a working group of the ITU Telecommunication Standardization Sector (ITU-T) concerned with standards for compression coding of video, images, audio ...
(VCEG) and
ISO/
IEC Moving Picture Experts Group
The Moving Picture Experts Group (MPEG) is an alliance of working groups established jointly by ISO and IEC that sets standards for media coding, including compression coding of audio, video, graphics, and genomic data; and transmission an ...
(MPEG), and developed with the involvement of many companies. It is the second part of the ISO/IEC
MPEG-2
MPEG-2 (a.k.a. H.222/H.262 as was defined by the ITU) is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods, w ...
standard. The ITU-T Recommendation H.262 and ISO/IEC 13818-2 documents are identical.
The standard is available for a fee from the ITU-T
and ISO. MPEG-2 Video is very similar to
MPEG-1
MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s (26:1 and 6:1 compression ratios respectively) without excessive quality loss, mak ...
, but also provides support for interlaced video (an encoding technique used in analog NTSC, PAL and SECAM television systems). MPEG-2 video is not optimized for low bit-rates (e.g., less than 1 Mbit/s), but somewhat outperforms MPEG-1 at higher bit rates (e.g., 3 Mbit/s and above), although not by a large margin unless the video is interlaced. All standards-conforming MPEG-2 Video decoders are also fully capable of playing back MPEG-1 Video streams.
History
The ISO/IEC approval process was completed in November 1994. The first edition was approved in July 1995
and published by ITU-T
and ISO/IEC in 1996.
Didier LeGall of
Bellcore
iconectiv is a supplier of network planning and network management services to telecommunications providers. Known as Bellcore after its establishment in the United States in 1983 as part of the break-up of the Bell System, the company's name ...
chaired the development of the standard and Sakae Okubo of
NTT was the ITU-T coordinator and chaired the agreements on its requirements.
The technology was developed with contributions from a number of companies.
Hyundai Electronics (now
SK Hynix
SK hynix Inc. is a South Korean supplier of dynamic random-access memory (DRAM) chips and flash memory chips. Hynix is the world's second-largest memory chipmaker (after Samsung Electronics) and the world's third-largest semiconductor company. ...
) developed the first MPEG-2 SAVI (System/Audio/Video) decoder in 1995.
The majority of
patent
A patent is a type of intellectual property that gives its owner the legal right to exclude others from making, using, or selling an invention for a limited period of time in exchange for publishing an sufficiency of disclosure, enabling disclo ...
s that were later asserted in a
patent pool
In patent law, a patent pool is a consortium of at least two companies agreeing to cross-license patents relating to a particular technology. The creation of a patent pool can save patentees and licensees time and money, and, in case of blocking ...
to be essential for implementing the standard came from three companies:
Sony
, commonly stylized as SONY, is a Japanese multinational conglomerate corporation headquartered in Minato, Tokyo, Japan. As a major technology company, it operates as one of the world's largest manufacturers of consumer and professional ...
(311 patents),
Thomson (198 patents) and
Mitsubishi Electric
, established on 15 January 1921, is a Japanese multinational electronics and electrical equipment manufacturing company headquartered in Tokyo, Japan. It is one of the core companies of Mitsubishi. The products from MELCO include elevators a ...
(119 patents).
In 1996, it was extended by two amendments to include the registration of copyright identifiers and the 4:2:2 Profile.
ITU-T published these amendments in 1996 and ISO in 1997.
There are also other amendments published later by ITU-T and ISO/IEC.
The most recent edition of the standard was published in 2013 and incorporates all prior amendments.
Editions
Video coding
Picture sampling
An
HDTV
High-definition television (HD or HDTV) describes a television system which provides a substantially higher image resolution than the previous generation of technologies. The term has been used since 1936; in more recent times, it refers to the ...
camera with 8-bit sampling generates a raw video stream of 25 × 1920 × 1080 × 3 = 155,520,000 bytes per second for 25 frame-per-second video (using the
4:4:4 sampling format). This stream of data must be compressed if digital TV is to fit in the bandwidth of available TV channels and if movies are to fit on DVDs.
Video compression is practical because the data in pictures is often redundant in space and time. For example, the sky can be blue across the top of a picture and that blue sky can persist for frame after frame. Also, because of the way the eye works, it is possible to delete or approximate some data from video pictures with little or no noticeable degradation in image quality.
A common (and old) trick to reduce the amount of data is to separate each complete "frame" of video into two "fields" upon broadcast/encoding: the "top field", which is the odd numbered horizontal lines, and the "bottom field", which is the even numbered lines. Upon reception/decoding, the two fields are displayed alternately with the lines of one field interleaving between the lines of the previous field; this format is called
interlaced video
Interlaced video (also known as interlaced scan) is a technique for doubling the perceived frame rate of a video display without consuming extra bandwidth. The interlaced signal contains two fields of a video frame captured consecutively. Thi ...
. The typical field rate is
50 (Europe/PAL) or
59.94 (US/NTSC) fields per second, corresponding to 25 (Europe/PAL) or 29.97 (North America/NTSC) whole frames per second. If the video is not interlaced, then it is called
progressive scan
Progressive scanning (alternatively referred to as noninterlaced scanning) is a format of displaying, storing, or transmitting moving images in which all the lines of each frame are drawn in sequence. This is in contrast to interlaced video use ...
video and each picture is a complete frame. MPEG-2 supports both options.
Digital television requires that these pictures be digitized so that they can be processed by computer hardware. Each picture element (a
pixel
In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device.
In most digital display devices, pixels are the s ...
) is then represented by one
luma number and two
chroma numbers. These describe the brightness and the color of the pixel (see
YCbCr
YCbCr, Y′CbCr, or Y Pb/Cb Pr/Cr, also written as YCBCR or Y′CBCR, is a family of color spaces used as a part of the color image pipeline in video and digital photography systems. Y′ is the Luma (video), luma component and CB and CR are t ...
). Thus, each digitized picture is initially represented by three rectangular arrays of numbers.
Another common practice to reduce the amount of data to be processed is to
subsample the two
chroma planes (after
low-pass filter
A low-pass filter is a filter that passes signals with a frequency lower than a selected cutoff frequency and attenuates signals with frequencies higher than the cutoff frequency. The exact frequency response of the filter depends on the filt ...
ing to avoid
aliasing
In signal processing and related disciplines, aliasing is an effect that causes different signals to become indistinguishable (or ''aliases'' of one another) when sampled. It also often refers to the distortion or artifact that results when ...
). This works because the
human visual system
The visual system comprises the sensory organ (the eye) and parts of the central nervous system (the retina containing photoreceptor cells, the optic nerve, the optic tract and the visual cortex) which gives organisms the sense of sight (the a ...
better resolves details of brightness than details in the hue and saturation of colors. The term ''
4:2:2'' is used for video with the chroma subsampled by a ratio of 2:1 horizontally, and ''4:2:0'' is used for video with the chroma subsampled by 2:1 both vertically and horizontally. Video that has luma and chroma at the same resolution is called ''4:4:4''. The MPEG-2 Video document considers all three sampling types, although 4:2:0 is by far the most common for consumer video, and there are no defined "profiles" of MPEG-2 for 4:4:4 video (see below for further discussion of profiles).
While the discussion below in this section generally describes MPEG-2 video compression, there are many details that are not discussed, including details involving fields, chrominance formats, responses to scene changes, special codes that label the parts of the bitstream, and other pieces of information. Aside from features for handling fields for interlaced coding, MPEG-2 Video is very similar to MPEG-1 Video (and even quite similar to the earlier
H.261 standard), so the entire description below applies equally well to MPEG-1.
I-frames, P-frames, and B-frames
MPEG-2 includes three basic types of coded frames: intra-coded frames (
I-frames), predictive-coded frames (
P-frames), and bidirectionally-predictive-coded frames (
B-frames).
An I-frame is a separately-compressed version of a single uncompressed (raw) frame. The coding of an I-frame takes advantage of spatial redundancy and of the inability of the eye to detect certain changes in the image. Unlike P-frames and B-frames, I-frames do not depend on data in the preceding or the following frames, and so their coding is very similar to how a still photograph would be coded (roughly similar to
JPEG
JPEG ( ) is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and im ...
picture coding). Briefly, the raw frame is divided into 8 pixel by 8 pixel blocks. The data in each block is transformed by the
discrete cosine transform (DCT). The result is an 8×8 matrix of coefficients that have
real number
In mathematics, a real number is a number that can be used to measurement, measure a ''continuous'' one-dimensional quantity such as a distance, time, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small var ...
values. The transform converts spatial variations into frequency variations, but it does not change the information in the block; if the transform is computed with perfect precision, the original block can be recreated exactly by applying the inverse cosine transform (also with perfect precision). The conversion from 8-bit integers to real-valued transform coefficients actually expands the amount of data used at this stage of the processing, but the advantage of the transformation is that the image data can then be approximated by
quantizing the coefficients. Many of the transform coefficients, usually the higher frequency components, will be zero after the quantization, which is basically a rounding operation. The penalty of this step is the loss of some subtle distinctions in brightness and color. The quantization may either be coarse or fine, as selected by the encoder. If the quantization is not too coarse and one applies the inverse transform to the matrix after it is quantized, one gets an image that looks very similar to the original image but is not quite the same. Next, the quantized coefficient matrix is itself compressed. Typically, one corner of the 8×8 array of coefficients contains only zeros after quantization is applied. By starting in the opposite corner of the matrix, then zigzagging through the matrix to combine the coefficients into a string, then substituting
run-length codes for consecutive zeros in that string, and then applying
Huffman coding
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algor ...
to that result, one reduces the matrix to a smaller quantity of data. It is this
entropy coded data that is broadcast or that is put on DVDs. In the receiver or the player, the whole process is reversed, enabling the receiver to reconstruct, to a close approximation, the original frame.
The processing of B-frames is similar to that of P-frames except that B-frames use the picture in a subsequent reference frame as well as the picture in a preceding reference frame. As a result, B-frames usually provide more compression than P-frames. B-frames are never reference frames in MPEG-2 Video.
Typically, every 15th frame or so is made into an I-frame. P-frames and B-frames might follow an I-frame like this, IBBPBBPBBPBB(I), to form a