Structural Similarity Index Measure
   HOME

TheInfoList



OR:

The structural similarity index measure (SSIM) is a method for predicting the perceived quality of digital television and cinematic pictures, as well as other kinds of digital images and videos. It is also used for measuring the similarity between two images. The SSIM index is a full reference metric; in other words, the measurement or prediction of
image quality Image quality can refer to the level of accuracy with which different imaging systems capture, process, store, compress, transmit and display the signals that form an image. Another definition refers to image quality as "the weighted combination of ...
is based on an initial uncompressed or distortion-free image as reference. SSIM is a
perception Perception () is the organization, identification, and interpretation of sensory information in order to represent and understand the presented information or environment. All perception involves signals that go through the nervous syste ...
-based model that considers image degradation as perceived change in structural information, while also incorporating important perceptual phenomena, including both
luminance Luminance is a photometric measure of the luminous intensity per unit area of light travelling in a given direction. It describes the amount of light that passes through, is emitted from, or is reflected from a particular area, and falls wit ...
masking and contrast masking terms. This distinguishes from other techniques such as
mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwee ...
(MSE) or
peak signal-to-noise ratio Peak signal-to-noise ratio (PSNR) is an engineering term for the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation. Because many signals have a very wide dynamic ...
(PSNR) that instead estimate absolute errors. Structural information is the idea that the pixels have strong inter-dependencies especially when they are spatially close. These dependencies carry important information about the structure of the objects in the visual scene. Luminance masking is a phenomenon whereby image distortions (in this context) tend to be less visible in bright regions, while contrast masking is a phenomenon whereby distortions become less visible where there is significant activity or "texture" in the image.


History

The predecessor of SSIM was called ''Universal Quality Index'' (UQI), or ''Wang–Bovik index'', which was developed by Zhou Wang and Alan Bovik in 2001. This evolved, through their collaboration with Hamid Sheikh and Eero Simoncelli, into the current version of SSIM, which was published in April 2004 in the '' IEEE Transactions on Image Processing''. In addition to defining the SSIM quality index, the paper provides a general context for developing and evaluating perceptual quality measures, including connections to human visual neurobiology and perception, and direct validation of the index against human subject ratings. The basic model was developed in the Laboratory for Image and Video Engineering (LIVE) at
The University of Texas at Austin The University of Texas at Austin (UT Austin, UT, or Texas) is a public research university in Austin, Texas, United States. Founded in 1883, it is the flagship institution of the University of Texas System. With 53,082 students as of fall 2 ...
and further developed jointly with the Laboratory for Computational Vision (LCV) at
New York University New York University (NYU) is a private university, private research university in New York City, New York, United States. Chartered in 1831 by the New York State Legislature, NYU was founded in 1832 by Albert Gallatin as a Nondenominational ...
. Further variants of the model have been developed in the Image and Visual Computing Laboratory at
University of Waterloo The University of Waterloo (UWaterloo, UW, or Waterloo) is a Public university, public research university located in Waterloo, Ontario, Canada. The main campus is on of land adjacent to uptown Waterloo and Waterloo Park. The university also op ...
and have been commercially marketed. SSIM subsequently found strong adoption in the image processing community and in the television and social media industries. The 2004 SSIM paper has been cited over 50,000 times according to
Google Scholar Google Scholar is a freely accessible web search engine that indexes the full text or metadata of Academic publishing, scholarly literature across an array of publishing formats and disciplines. Released in Beta release, beta in November 2004, th ...
, making it one of the highest cited papers in the image processing and video engineering fields. It was recognized with the
IEEE Signal Processing Society The IEEE Signal Processing Society (IEEE SPS) is one of the nearly 40 technical societies of the Institute of Electrical and Electronics Engineers (IEEE) and the first one created. Its mission is to "advance and disseminate state-of-the-art scie ...
Best Paper Award for 2009. It also received the
IEEE Signal Processing Society The IEEE Signal Processing Society (IEEE SPS) is one of the nearly 40 technical societies of the Institute of Electrical and Electronics Engineers (IEEE) and the first one created. Its mission is to "advance and disseminate state-of-the-art scie ...
Sustained Impact Award for 2016, indicative of a paper having an unusually high impact for at least 10 years following its publication. Because of its high adoption by the television industry, the authors of the original SSIM paper were each accorded a Primetime Engineering Emmy Award in 2015 by the Television Academy.


Algorithm

The SSIM index is calculated between two windows of pixel values x and y of common size, from corresponding locations in two images to be compaired. These SSIM values can be aggregated across the full images by averaging or other variations.


Special-case formula

In one simple special case, further explained in the next section, the SSIM measure between x and y is: \hbox(x,y) = \frac with: * \mu_x the
pixel sample mean The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or me ...
of x; * \mu_y the
pixel sample mean The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or me ...
of y; * \sigma_x^2 the
sample variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, ...
of x; * \sigma_y^2 the
sample variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, ...
of y; * \sigma_ the
sample covariance The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or me ...
of x and y; * c_1 = (k_1L)^2, c_2 = (k_2L)^2 two variables to stabilize the division with weak denominator; * L the
dynamic range Dynamics (from Greek δυναμικός ''dynamikos'' "powerful", from δύναμις ''dynamis'' " power") or dynamic may refer to: Physics and engineering * Dynamics (mechanics), the study of forces and their effect on motion Brands and ent ...
of the pixel-values (typically this is 2^-1); * k_1 = 0.01 and k_2 = 0.03 by default.


General formula and components

The SSIM formula is based on three comparison measurements between the samples of x and y: luminance (l), contrast (c), and structure (s). The individual comparison functions are: l(x,y)=\frac c(x,y)=\frac s(x,y)=\frac The SSIM for each block is then a weighted combination of those comparative measures: \text(x,y) = l(x,y)^\alpha \cdot c(x,y)^\beta \cdot s(x,y)^\gamma Choosing the third denominator stabilizing constant as: * c_3 = c_2 / 2 leads to a simplification when combining the ''c'' and ''s'' components with equal exponents (\beta = \gamma), as the numerator of ''c'' is then twice the denominator of ''s'', leading to a cancellation leaving just a 2. Setting the weights (exponents) \alpha,\beta,\gamma to 1, the formula can then be reduced to the special case shown above.


Mathematical properties

SSIM satisfies the identity of indiscernibles, and symmetry properties, but not the triangle inequality or non-negativity, and thus is not a
distance function In mathematics, a metric space is a set together with a notion of ''distance'' between its elements, usually called points. The distance is measured by a function called a metric or distance function. Metric spaces are a general setting fo ...
. However, under certain conditions, SSIM may be converted to a normalized root MSE measure, which is a distance function. The square of such a function is not convex, but is locally convex and
quasiconvex In mathematics, a quasiconvex function is a real-valued function defined on an interval or on a convex subset of a real vector space such that the inverse image of any set of the form (-\infty,a) is a convex set. For a function of a singl ...
, making SSIM a feasible target for optimization.


Application of the formula

In order to evaluate the image quality, this formula is usually applied only on luma, although it may also be applied on color (e.g.,
RGB The RGB color model is an additive color model in which the red, green, and blue primary colors of light are added together in various ways to reproduce a broad array of colors. The name of the model comes from the initials of the three ...
) values or chromatic (e.g.
YCbCr YCbCr, Y′CbCr, also written as YCBCR or Y′CBCR, is a family of color spaces used as a part of the color image pipeline in digital video and digital photography, photography systems. Like YPbPr, YPBPR, it is based on RGB primaries; the two ...
) values. The resultant SSIM index is a decimal value between -1 and 1, where 1 indicates perfect similarity, 0 indicates no similarity, and -1 indicates perfect anti-correlation. For an image, it is typically calculated using a sliding Gaussian window of size 11x11 or a block window of size 8×8. The window can be displaced pixel-by-pixel on the image to create an SSIM quality map of the image. In the case of video quality assessment, the authors propose to use only a subgroup of the possible windows to reduce the complexity of the calculation.


Variants


Multi-scale SSIM

A more advanced form of SSIM, called Multiscale SSIM (MS-SSIM) is conducted over multiple scales through a process of multiple stages of sub-sampling, reminiscent of multiscale processing in the early vision system. It has been shown to perform equally well or better than SSIM on different subjective image and video databases.


Multi-component SSIM

(3-SSIM) is a form of SSIM that takes into account the fact that the human eye can see differences more precisely on textured or edge regions than on smooth regions. The resulting metric is calculated as a weighted average of SSIM for three categories of regions: edges, textures, and smooth regions. The proposed weighting is 0.5 for edges, 0.25 for the textured and smooth regions. The authors mention that a 1/0/0 weighting (ignoring anything but edge distortions) leads to results that are closer to subjective ratings. This suggests that edge regions play a dominant role in image quality perception. The authors of 3-SSIM have also extended the model into (4-SSIM). The edge types are further subdivided into preserved and changed edges by their distortion status. The proposed weighting is 0.25 for all four components.


Structural dissimilarity

Structural dissimilarity (DSSIM) may be derived from SSIM, though it does not constitute a distance function as the triangle inequality is not necessarily satisfied. \hbox(x,y) = \frac


Video quality metrics and temporal variants

It is worth noting that the original version SSIM was designed to measure the quality of still images. It does not contain any parameters directly related to temporal effects of human perception and human judgment. A common practice is to calculate the average SSIM value over all frames in the video sequence. However, several temporal variants of SSIM have been developed.


Complex wavelet SSIM

The complex wavelet transform variant of the SSIM (CW-SSIM) is designed to deal with issues of image scaling, translation and rotation. Instead of giving low scores to images with such conditions, the CW-SSIM takes advantage of the complex wavelet transform and therefore yields higher scores to said images. The CW-SSIM is defined as follows: \text(c_x,c_y)=\bigg(\frac\bigg)\bigg(\frac\bigg) Where c_x is the complex wavelet transform of the signal x and c_y is the complex wavelet transform for the signal y. Additionally, K is a small positive number used for the purposes of function stability. Ideally, it should be zero. Like the SSIM, the CW-SSIM has a maximum value of 1. The maximum value of 1 indicates that the two signals are perfectly structurally similar while a value of 0 indicates no structural similarity.


SSIMPLUS

The SSIMPLUS index is based on SSIM and is a commercially available tool. It extends SSIM's capabilities, mainly to target video applications. It provides scores in the range of 0–100, linearly matched to human subjective ratings. It also allows adapting the scores to the intended viewing device, comparing video across different resolutions and contents. According to its authors, SSIMPLUS achieves higher accuracy and higher speed than other image and video quality metrics. However, no independent evaluation of SSIMPLUS has been performed, as the algorithm itself is not publicly available.


cSSIM

In order to further investigate the standard ''discrete'' SSIM from a theoretical perspective, the ''continuous'' SSIM (cSSIM) has been introduced and studied in the context of
radial basis function interpolation Radial basis function (RBF) interpolation is an advanced method in approximation theory for constructing Order of accuracy, high-order accurate interpolation, interpolants of unstructured data, possibly in high-dimensional spaces. The interpolant t ...
.


SSIMULACRA

SSIMULACRA and SSIMULACRA2 are variants of SSIM developed by
Cloudinary Cloudinary is a SaaS company providing cloud media management services for websites and apps. The company is headquartered in San Jose, California with offices in Israel, England, Poland, and Singapore. History Cloudinary was founded in 2011 ...
with the goal of fitted to subjective opinion data. The variants operate in XYB color space and combine MS-SSIM with two types of asymmetric error maps for blockiness/ringing and smoothing/blur, common compression artifacts. SSIMULACRA2 is part of libjxl, the reference implementation of
JPEG XL The JPEG XL Image Coding System is a royalty-free open standard for a image compression, compressed Raster graphics, raster image format. It defines a graphics file format and the abstract device for coding JPEG XL bitstreams. It is developed by t ...
.


Other simple modifications

The r* cross-correlation metric is based on the variance metrics of SSIM. It's defined as when , when both standard deviations are zero, and when only one is zero. It has found use in analyzing human response to contrast-detail phantoms. SSIM has also been used on the
gradient In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...
of images, making it "G-SSIM". G-SSIM is especially useful on blurred images. The modifications above can be combined. For example, 4-G-r* is a combination of 4-SSIM, G-SSIM, and r*. It is able to reflect radiologist preference for images much better than other SSIM variants tested.


Application

SSIM has applications in a variety of different problems. Some examples are: * Image compression: In lossy
image compression Image compression is a type of data compression applied to digital images, to reduce their cost for computer data storage, storage or data transmission, transmission. Algorithms may take advantage of visual perception and the statistical properti ...
, information is deliberately discarded to decrease the storage space of images and video. The MSE is typically used in such compression schemes. According to its authors, using SSIM instead of MSE is suggested to produce better results for the decompressed images. * Image restoration: Image restoration focuses on solving the problem y=h * x+n where y is the blurry image that should be restored, h is the blur kernel, n is the additive noise and x is the original image we wish to recover. The traditional filter which is used to solve this problem is the Wiener Filter. However, the Wiener filter design is based on the MSE. Using an SSIM variant, specifically Stat-SSIM, is claimed to produce better visual results, according to the algorithm's authors. * Pattern recognition: Since SSIM mimics aspects of human perception, it could be used for recognizing patterns. When faced with issues like image scaling, translation and rotation, the algorithm's authors claim that it is better to use CW-SSIM, which is insensitive to these variations and may be directly applied by template matching without using any training sample. Since data-driven pattern recognition approaches may produce better performance when a large amount of data is available for training, the authors suggest using CW-SSIM in data-driven approaches.


Performance comparison

Due to its popularity, SSIM is often compared to other metrics, including more simple metrics such as MSE and PSNR, and other perceptual image and video quality metrics. SSIM has been repeatedly shown to significantly outperform MSE and its derivates in accuracy, including research by its own authors and others. A paper by Dosselmann and Yang claims that the performance of SSIM is "much closer to that of the MSE" than usually assumed. While they do not dispute the advantage of SSIM over MSE, they state an analytical and functional dependency between the two metrics. According to their research, SSIM has been found to correlate as well as MSE-based methods on subjective databases other than the databases from SSIM's creators. As an example, they cite Reibman and Poole, who found that MSE outperformed SSIM on a database containing packet-loss–impaired video. In another paper, an analytical link between PSNR and SSIM was identified.


See also

*
Mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwee ...
*
Peak signal-to-noise ratio Peak signal-to-noise ratio (PSNR) is an engineering term for the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation. Because many signals have a very wide dynamic ...
* Video Multimethod Assessment Fusion (VMAF) *
Video quality Video quality is a characteristic of a video passed through a video transmission or processing system that describes perceived video degradation (typically compared to the original video). Video processing systems may introduce some amount of disto ...


References


External links


Home page

Rust Implementation

C/C++ Implementation

DSSIM C++ Implementation



qpsnr implementation (multi threaded C++)

Implementation in VQMT software


* ttps://elib.dlr.de/91439/1/Gintautas_Palubinskas_ICIP_2014.pdf#"Mystery Behind Similarity Measures MSE and SSIM", Gintautas Palubinskas, 2014 {{Machine learning evaluation metrics Image processing