Motion compensation in computing, is an algorithmic technique used to predict a frame in a video, given the previous and/or future frames by accounting for motion of the camera and/or objects in the video. It is employed in the encoding of video data for
video compression, for example in the generation of
MPEG-2
MPEG-2 (a.k.a. H.222/H.262 as was defined by the ITU) is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods, w ...
files. Motion compensation describes a picture in terms of the transformation of a reference picture to the current picture. The reference picture may be previous in time or even from the future. When images can be accurately synthesized from previously transmitted/stored images, the compression efficiency can be improved.
Motion compensation is one of the two key
video compression techniques used in
video coding standards
A video coding format (or sometimes video compression format) is a content representation format for storage or transmission of digital video content (such as in a data file or bitstream). It typically uses a standardized video compression alg ...
, along with the
discrete cosine transform (DCT). Most video coding standards, such as the
H.26x and
MPEG
The Moving Picture Experts Group (MPEG) is an alliance of working groups established jointly by ISO and IEC that sets standards for media coding, including compression coding of audio, video, graphics, and genomic data; and transmission and fi ...
formats, typically use motion-compensated DCT hybrid coding,
known as block motion compensation (BMC) or motion-compensated DCT (MC DCT).
Functionality
Motion compensation exploits the fact that, often, for many
frames
A frame is often a structural system that supports other components of a physical construction and/or steel frame that limits the construction's extent.
Frame and FRAME may also refer to:
Physical objects
In building construction
*Framing (co ...
of a movie, the only difference between one frame and another is the result of either the camera moving or an object in the frame moving. In reference to a video file, this means much of the information that represents one frame will be the same as the information used in the next frame.
Using motion compensation, a video stream will contain some full (reference) frames; then the only information stored for the frames in between would be the information needed to transform the previous frame into the next frame.
Illustrated example
The following is a simplistic illustrated explanation of how motion compensation works. Two successive frames were captured from the movie ''
Elephants Dream
''Elephants Dream'' (code-named Project Orange during production and originally titled ''Machina'') is a 2006 Dutch Computer animation, computer animated science fiction fantasy Experimental film, experimental short film produced by Blender Founda ...
''. As can be seen from the images, the bottom (motion compensated) difference between two frames contains significantly less detail than the prior images, and thus compresses much better than the rest. Thus the information that is required to encode compensated frame will be much smaller than with the difference frame. This also means that it is also possible to encode the information using difference image at a cost of less compression efficiency but by saving coding complexity without motion compensated coding; as a matter of fact that motion compensated coding (together with
motion estimation
Motion estimation is the process of determining ''motion vectors'' that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions b ...
, motion compensation) occupies more than 90% of encoding complexity.
MPEG
In
MPEG
The Moving Picture Experts Group (MPEG) is an alliance of working groups established jointly by ISO and IEC that sets standards for media coding, including compression coding of audio, video, graphics, and genomic data; and transmission and fi ...
, images are predicted from previous frames or bidirectionally from previous and future frames are more complex because the image sequence must be transmitted and stored out of order so that the future frame is available to generate the
After predicting frames using motion compensation, the coder finds the residual, which is then compressed and transmitted.
Global motion compensation
In
global motion compensation, the motion model basically reflects camera motions such as:
* Dolly - moving the camera forward or backward
* Track - moving the camera left or right
* Boom - moving the camera up or down
* Pan - rotating the camera around its Y axis, moving the view left or right
* Tilt - rotating the camera around its X axis, moving the view up or down
* Roll - rotating the camera around the view axis
It works best for still scenes without moving objects.
There are several advantages of global motion compensation:
* It models the dominant motion usually found in video sequences with just a few parameters. The share in bit-rate of these parameters is negligible.
* It does not partition the frames. This avoids artifacts at partition borders.
* A straight line (in the time direction) of pixels with equal spatial positions in the frame corresponds to a continuously moving point in the real scene. Other MC schemes introduce discontinuities in the time direction.
MPEG-4 ASP supports GMC with three reference points, although some implementations can only make use of one. A single reference point only allows for translational motion which for its relatively large performance cost provides little advantage over block based motion compensation.
Moving objects within a frame are not sufficiently represented by global motion compensation.
Thus, local
motion estimation
Motion estimation is the process of determining ''motion vectors'' that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions b ...
is also needed.
Motion-compensated DCT
Block motion compensation
Block motion compensation (BMC), also known as motion-compensated
discrete cosine transform (MC DCT), is the most widely used motion compensation technique.
In BMC, the frames are partitioned in blocks of pixels (e.g. macro-blocks of 16Ă—16 pixels in
MPEG
The Moving Picture Experts Group (MPEG) is an alliance of working groups established jointly by ISO and IEC that sets standards for media coding, including compression coding of audio, video, graphics, and genomic data; and transmission and fi ...
).
Each block is predicted from a block of equal size in the reference frame.
The blocks are not transformed in any way apart from being shifted to the position of the predicted block.
This shift is represented by a ''motion vector''.
To exploit the redundancy between neighboring block vectors, (e.g. for a single moving object covered by multiple blocks) it is common to encode only the difference between the current and previous motion vector in the bit-stream. The result of this differentiating process is mathematically equivalent to a global motion compensation capable of panning.
Further down the encoding pipeline, an
entropy coder will take advantage of the resulting statistical distribution of the motion vectors around the zero vector to reduce the output size.
It is possible to shift a block by a non-integer number of pixels, which is called ''sub-pixel precision''.
The in-between pixels are generated by interpolating neighboring pixels. Commonly, half-pixel or quarter pixel precision (
Qpel, used by H.264 and MPEG-4/ASP) is used. The computational expense of sub-pixel precision is much higher due to the extra processing required for interpolation and on the encoder side, a much greater number of potential source blocks to be evaluated.
The main disadvantage of block motion compensation is that it introduces discontinuities at the block borders (blocking artifacts).
These artifacts appear in the form of sharp horizontal and vertical edges which are easily spotted by the human eye and produce false edges and ringing effects (large coefficients in high frequency sub-bands) due to quantization of coefficients of the
Fourier-related transform used for
transform coding
Transform coding is a type of data compression for "natural" data like audio signals or photographic images. The transformation is typically lossless (perfectly reversible) on its own but is used to enable better (more targeted) quantization, w ...
of the
residual frames
Block motion compensation divides up the ''current'' frame into non-overlapping blocks, and the motion compensation vector tells where those blocks come ''from''
(a common misconception is that the ''previous frame'' is divided up into non-overlapping blocks, and the motion compensation vectors tell where those blocks move ''to'').
The source blocks typically overlap in the source frame.
Some video compression algorithms assemble the current frame out of pieces of several different previously-transmitted frames.
Frames can also be predicted from future frames.
The future frames then need to be encoded before the predicted frames and thus, the encoding order does not necessarily match the real frame order.
Such frames are usually predicted from two directions, i.e. from the I- or P-frames that immediately precede or follow the predicted frame.
These bidirectionally predicted frames are called
''B-frames''.
A coding scheme could, for instance, be IBBPBBPBBPBB.
Further, the use of triangular tiles has also been proposed for motion compensation. Under this scheme, the frame is tiled with triangles, and the next frame is generated by performing an affine transformation on these triangles. Only the affine transformations are recorded/transmitted. This is capable of dealing with zooming, rotation, translation etc.
Variable block-size motion compensation
Variable block-size motion compensation (VBSMC) is the use of BMC with the ability for the encoder to dynamically select the size of the blocks. When coding video, the use of larger blocks can reduce the number of bits needed to represent the motion vectors, while the use of smaller blocks can result in a smaller amount of prediction residual information to encode. Other areas of work have examined the use of variable-shape feature metrics, beyond block boundaries, from which interframe vectors can be calculated. Older designs such as
H.261 and
MPEG-1
MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s (26:1 and 6:1 compression ratios respectively) without excessive quality loss, mak ...
video typically use a fixed block size, while newer ones such as
H.263
H.263 is a video compression standard originally designed as a low-bit-rate compressed format for videotelephony. It was standardized by the ITU-T Video Coding Experts Group (VCEG) in a project ending in 1995/1996. It is a member of the H.26x fam ...
,
MPEG-4 Part 2
MPEG-4 Part 2, MPEG-4 Visual (formally ISO/ IEC 14496-2) is a video compression format developed by the Moving Picture Experts Group (MPEG). It belongs to the MPEG-4 ISO/IEC standards. It uses block-wise motion compensation and a discrete cosi ...
,
H.264/MPEG-4 AVC, and
VC-1
SMPTE 421, informally known as VC-1, is a video coding format. Most of it was initially developed as Microsoft's proprietary video format Windows Media Video 9 in 2003. With some enhancements including the development of a new Advanced Profile, ...
give the encoder the ability to dynamically choose what block size will be used to represent the motion.
Overlapped block motion compensation
Overlapped block motion compensation (OBMC) is a good solution to these problems because it not only increases prediction accuracy but also avoids blocking artifacts. When using OBMC,
blocks are typically twice as big in each dimension and overlap quadrant-wise with all 8 neighbouring blocks.
Thus, each pixel belongs to 4 blocks. In such a scheme, there are 4 predictions for each pixel which are summed up to a weighted mean.
For this purpose, blocks are associated with a window function that has the property that the sum of 4 overlapped windows is equal to 1 everywhere.
Studies of methods for reducing the complexity of OBMC have shown that the contribution to the window function is smallest for the diagonally-adjacent block. Reducing the weight for this contribution to zero and increasing the other weights by an equal amount leads to a substantial reduction in complexity without a large penalty in quality. In such a scheme, each pixel then belongs to 3 blocks rather than 4, and rather than using 8 neighboring blocks, only 4 are used for each block to be compensated. Such a scheme is found in the
H.263
H.263 is a video compression standard originally designed as a low-bit-rate compressed format for videotelephony. It was standardized by the ITU-T Video Coding Experts Group (VCEG) in a project ending in 1995/1996. It is a member of the H.26x fam ...
Annex F Advanced Prediction mode
Quarter Pixel (QPel) and Half Pixel motion compensation
In motion compensation, quarter or half samples are actually interpolated sub-samples caused by fractional motion vectors. Based on the vectors and full-samples, the sub-samples can be calculated by using bicubic or bilinear 2-D filtering. See subclause 8.4.2.2 "Fractional sample interpolation process" of the H.264 standard.
3D image coding techniques
Motion compensation is utilized in
Stereoscopic Video Coding
In video, ''time'' is often considered as the third dimension. Still image coding techniques can be expanded to an extra dimension.
JPEG 2000
JPEG 2000 (JP2) is an image compression standard and coding system. It was developed from 1997 to 2000 by a Joint Photographic Experts Group committee chaired by Touradj Ebrahimi (later the JPEG president), with the intention of superseding th ...
uses wavelets, and these can also be used to encode motion without gaps between blocks in an adaptive way. Fractional pixel
affine transformation
In Euclidean geometry, an affine transformation or affinity (from the Latin, ''affinis'', "connected with") is a geometric transformation that preserves lines and parallelism, but not necessarily Euclidean distances and angles.
More generall ...
s lead to bleeding between adjacent pixels. If no higher internal resolution is used the delta images mostly fight against the image smearing out. The delta image can also be encoded as wavelets, so that the borders of the adaptive blocks match.
2D+Delta
2D Plus Delta (also 2D+Delta) is a method of encoding 3D image listed as a part of MPEG2 and MPEG4 standards, specifically on the H.264 implementation of the Multiview Video Coding extension. This technology originally started as a proprietary me ...
Encoding techniques utilize
H.264 and
MPEG-2
MPEG-2 (a.k.a. H.222/H.262 as was defined by the ITU) is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods, w ...
compatible coding and can use motion compensation to compress between stereoscopic images.
History
A precursor to the concept of motion compensation dates back to 1929, when R.D. Kell in Britain proposed the concept of transmitting only the portions of an
analog video
Video is an electronic medium for the recording, copying, playback, broadcasting, and display of moving visual media. Video was first developed for mechanical television systems, which were quickly replaced by cathode-ray tube (CRT) syst ...
scene that changed from frame-to-frame. In 1959, the concept of
inter-frame motion compensation was proposed by
NHK researchers Y. Taki, M. Hatori and S. Tanaka, who proposed predictive inter-frame
video coding in the
temporal dimension.
Motion-compensated DCT
Practical motion-compensated
video compression emerged with the development of motion-compensated
DCT (MC DCT) coding,
also called block motion compensation (BMC) or DCT motion compensation. This is a hybrid coding algorithm,
which combines two key
data compression
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...
techniques:
discrete cosine transform (DCT) coding
in the
spatial dimension
In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coordinat ...
, and predictive motion compensation in the
temporal dimension.
DCT coding is a
lossy
In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...
block compression
transform coding
Transform coding is a type of data compression for "natural" data like audio signals or photographic images. The transformation is typically lossless (perfectly reversible) on its own but is used to enable better (more targeted) quantization, w ...
technique that was first proposed by
Nasir Ahmed, who initially intended it for
image compression
Image compression is a type of data compression applied to digital images, to reduce their cost for storage or transmission. Algorithms may take advantage of visual perception and the statistical properties of image data to provide superior re ...
, in 1972.
In 1974, Ali Habibi at the
University of Southern California
, mottoeng = "Let whoever earns the palm bear it"
, religious_affiliation = Nonsectarian—historically Methodist
, established =
, accreditation = WSCUC
, type = Private research university
, academic_affiliations =
, endowment = $8. ...
introduced hybrid coding, which combines predictive coding with transform coding.
However, his algorithm was initially limited to
intra-frame coding in the spatial dimension. In 1975, John A. Roese and Guner S. Robinson extended Habibi's hybrid coding algorithm to the temporal dimension, using transform coding in the spatial dimension and predictive coding in the temporal dimension, developing
inter-frame motion-compensated hybrid coding.
For the spatial transform coding, they experimented with the DCT and the
fast Fourier transform
A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). Fourier analysis converts a signal from its original domain (often time or space) to a representation in t ...
(FFT), developing inter-frame hybrid coders for both, and found that the DCT is the most efficient due to its reduced complexity, capable of compressing image data down to 0.25-
bit
The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented a ...
per
pixel
In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device.
In most digital display devices, pixels are the s ...
for a
videotelephone scene with image quality comparable to an intra-frame coder requiring 2-bit per pixel.
In 1977, Wen-Hsiung Chen developed a fast DCT algorithm with C.H. Smith and S.C. Fralick. In 1979,
Anil K. Jain and Jaswant R. Jain further developed motion-compensated DCT video compression,
also called block motion compensation.
This led to Chen developing a practical video compression algorithm, called motion-compensated DCT or adaptive scene coding, in 1981.
Motion-compensated DCT later became the standard coding technique for video compression from the late 1980s onwards.
The first digital
video coding standard was
H.120
H.120 was the first digital video compression standard. It was developed by COST 211 and published by the CCITT (now the ITU-T) in 1984, with a revision in 1988 that included contributions proposed by other organizations. The video turned out not ...
, developed by the
CCITT (now ITU-T) in 1984.
H.120 used motion-compensated DPCM coding,
which was inefficient for video coding,
and H.120 was thus impractical due to low performance.
The
H.261 standard was developed in 1988 based on motion-compensated DCT compression,
and it was the first practical video coding standard.
Since then, motion-compensated DCT compression has been adopted by all the major video coding standards (including the
H.26x and
MPEG
The Moving Picture Experts Group (MPEG) is an alliance of working groups established jointly by ISO and IEC that sets standards for media coding, including compression coding of audio, video, graphics, and genomic data; and transmission and fi ...
formats) that followed.
See also
*
Motion estimation
Motion estimation is the process of determining ''motion vectors'' that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions b ...
*
Image stabilization
*
Inter frame
*
HDTV blur
*
Television standards conversion
*
VidFIRE
*
X-Video Motion Compensation
Applications
*
video compression
* change of
framerate
Frame rate (expressed in or FPS) is the frequency (rate) at which consecutive images ( frames) are captured or displayed. The term applies equally to film and video cameras, computer graphics, and motion capture systems. Frame rate may also b ...
for playback of 24 frames per second movies on 60 Hz
LCD
A liquid-crystal display (LCD) is a flat-panel display or other electronically modulated optical device that uses the light-modulating properties of liquid crystals combined with polarizers. Liquid crystals do not emit light directly but in ...
s or 100 Hz
interlaced
Interlaced video (also known as interlaced scan) is a technique for doubling the perceived frame rate of a video display without consuming extra bandwidth. The interlaced signal contains two fields of a video frame captured consecutively. Thi ...
cathode ray tubes
A cathode-ray tube (CRT) is a vacuum tube containing one or more electron guns, which emit electron beams that are manipulated to display images on a phosphorescent screen. The images may represent electrical waveforms (oscilloscope), p ...
References
External links
Temporal Rate Conversion- article giving an overview of motion compensation techniques.
A New FFT Architecture and Chip Design for Motion Compensation based on Phase CorrelationDCT and DFT coefficients are related by simple factorsDCT better than DFT also for video* John Wiseman
An Introduction to MPEG Video CompressionDCT and motion compensationCompatibility between DCT, motion compensation and other methods
{{Compression Methods
Film and video technology
H.26x
Video compression
Motion in computer vision