In
digital image processing
Digital image processing is the use of a digital computer to process digital images through an algorithm. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing. It allow ...
, the sum of absolute differences (SAD) is a measure of the similarity between image
blocks. It is calculated by taking the
absolute difference
The absolute difference of two real numbers x and y is given by , x-y, , the absolute value of their difference. It describes the distance on the real line between the points corresponding to x and y. It is a special case of the Lp distance for ...
between each
pixel
In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device.
In most digital display devices, pixels are the s ...
in the original block and the corresponding pixel in the block being used for comparison. These differences are summed to create a simple metric of block similarity, the
''L''1 norm of the difference image or
Manhattan distance
A taxicab geometry or a Manhattan geometry is a geometry whose usual distance function or metric of Euclidean geometry is replaced by a new metric in which the distance between two points is the sum of the absolute differences of their Cartesian co ...
between two image blocks.
The sum of absolute differences may be used for a variety of purposes, such as
object recognition
Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the ...
, the generation of
disparity maps for
stereo
Stereophonic sound, or more commonly stereo, is a method of sound reproduction that recreates a multi-directional, 3-dimensional audible perspective. This is usually achieved by using two independent audio channels through a configuration ...
images, and
motion estimation
Motion estimation is the process of determining ''motion vectors'' that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions b ...
for
video compression.
Example
This example uses the sum of absolute differences to identify which part of a search image is most similar to a template image. In this example, the template image is 3 by 3 pixels in size, while the search image is 3 by 5 pixels in size. Each pixel is represented by a single
integer
An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
from 0 to 9.
Template Search image
2 5 5 2 7 5 8 6
4 0 7 1 7 4 2 7
7 5 9 8 4 6 8 5
There are exactly three unique locations within the search image where the template may fit: the left side of the image, the center of the image, and the right side of the image. To calculate the SAD values, the absolute value of the difference between each corresponding pair of pixels is used: the difference between 2 and 2 is 0, 4 and 1 is 3, 7 and 8 is 1, and so forth.
Calculating the values of the absolute differences for each pixel, for the three possible template locations, gives the following:
Left Center Right
0 2 0 5 0 3 3 3 1
3 7 3 3 4 5 0 2 0
1 1 3 3 1 1 1 3 4
For each of these three image patches, the 9 absolute differences are added together, giving SAD values of 20, 25, and 17, respectively. From these SAD values, it could be asserted that the right side of the search image is the most similar to the template image, because it has the lowest sum of absolute differences as compared to the other two locations.
Comparison to other metrics
Object recognition
The sum of absolute differences provides a simple way to automate the searching for objects inside an image, but may be unreliable due to the effects of contextual factors such as changes in lighting, color, viewing direction, size, or shape. The SAD may be used in conjunction with other object recognition methods, such as
edge detection
Edge detection includes a variety of mathematical methods that aim at identifying edges, curves in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. The same problem of finding discontinuiti ...
, to improve the reliability of results.
Video compression
SAD is an extremely fast metric due to its simplicity; it is effectively the simplest possible metric that takes into account every
pixel
In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device.
In most digital display devices, pixels are the s ...
in a block. Therefore, it is very effective for a wide motion search of many different blocks. SAD is also easily
parallelizable
In mathematics, a differentiable manifold M of dimension ''n'' is called parallelizable if there exist smooth vector fields
\
on the manifold, such that at every point p of M the tangent vectors
\
provide a basis of the tangent space at p. Equ ...
since it analyzes each pixel separately, making it easily implementable with such instructions as
ARM NEON
ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced instruction set computer (RISC) instruction set architectures for computer processors, configured ...
or
x86 SSE2. For example, SSE has packed sum of absolute differences instruction (PSADBW) specifically for this purpose. Once candidate blocks are found, the final refinement of the motion estimation process is often done with other slower but more accurate metrics, which better take into account
human perception. These include the
sum of absolute transformed differences (SATD), the
sum of squared differences (SSD), and
rate-distortion optimization.
See also
*
Computer stereo vision
Computer stereo vision is the extraction of 3D information from digital images, such as those obtained by a CCD camera. By comparing information about a scene from two vantage points, 3D information can be extracted by examining the relative positi ...
*
Hadamard transform
The Hadamard transform (also known as the Walsh–Hadamard transform, Hadamard–Rademacher–Walsh transform, Walsh transform, or Walsh–Fourier transform) is an example of a generalized class of Fourier transforms. It performs an orthogonal ...
*
Motion compensation
Motion compensation in computing, is an algorithmic technique used to predict a frame in a video, given the previous and/or future frames by accounting for motion of the camera and/or objects in the video. It is employed in the encoding of video d ...
*
Motion estimation
Motion estimation is the process of determining ''motion vectors'' that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions b ...
*
Object recognition (computer vision)
*
Rate-distortion optimization
References
*{{cite book , last = E. G. Richardson , first = Iain , title = H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia , publisher = John Wiley & Sons Ltd. , year = 2003 , location = Chichester
Video compression
Signal processing metrics
Loss functions