2.2 Luma coding in video systemsGrayscale images are distinct from one-bit bi-tonal black-and-white images, which, in the context of computer imaging, are images with only two colors: black and white (also called bilevel or binary images). Grayscale images have many shades of gray in between.
Grayscale images can be the result of measuring the intensity of light at each pixel according to a particular weighted combination of frequencies (or wavelengths), and in such cases they are monochromatic proper when only a single frequency (in practice, a narrow band of frequencies) is captured. The frequencies can in principle be from anywhere in the electromagnetic spectrum (e.g. infrared, visible light, ultraviolet, etc.).
A colorimetric (or more specifically photometric) grayscale image is an image that has a defined grayscale colorspace, which maps the stored numeric sample values to the achromatic channel of a standard colorspace, which itself is based on measured properties of human vision.
If the original color image has no defined colorspace, or if the grayscale image is not intended to have the same human-perceived achromatic intensity as the color image, then there is no unique mapping from such a color image to a grayscale image.
The intensity of a pixel is expressed within a given range between a minimum and a maximum, inclusive. This range is represented in an abstract way as a range from 0 (or 0%) (total absence, black) and 1 (or 100%) (total presence, white), with any fractional values in between. This notation is used in academic papers, but this does not define what "black" or "white" is in terms of colorimetry. Sometimes the scale is reversed, as in printing where the numeric intensity denotes how much ink is employed in halftoning, with 0% representing the paper white (no ink) and 100% being a solid black (full ink).
In computing, although the grayscale can be computed through rational numbers, image pixels are usually quantized to store them as unsigned integers, to reduce the required storage and computation. Some early grayscale monitors can only display up to sixteen different shades, which would be stored in binary form using 4 bits. But today grayscale images (such as photographs) intended for visual display (both on screen and printed) are commonly stored with 8 bits per sampled pixel. This pixel depth allows 256 different intensities (i.e., shades of gray) to be recorded, and also simplifies computation as each pixel sample can be accessed individually as one full byte. However, if these intensities were spaced equally in proportion to the amount of physical light they represent at that pixel (called a linear encoding or scale), the differences between adjacent dark shades could be quite noticeable as banding artifacts, while many of the lighter shades would be "wasted" by encoding a lot of perceptually-indistinguishable increments. Therefore, the shades are instead typically spread out evenly on a gamma-compressed nonlinear scale, which better approximates uniform perceptual increments for both dark and light shades, usually making these 256 shades enough (just barely) to avoid noticeable increments.
Technical uses (e.g. in medical imaging or remote sensing applications) often require more levels, to make full use of the sensor accuracy (typically 10 or 12 bits per sample) and to reduce rounding errors in computations. Sixteen bits per sample (65,536 levels) is often a convenient choice for such uses, as computers manage 16-bit words efficiently. The TIFF and PNG (among other) image file formats support 16-bit grayscale natively, although browser
In computing, although the grayscale can be computed through rational numbers, image pixels are usually quantized to store them as unsigned integers, to reduce the required storage and computation. Some early grayscale monitors can only display up to sixteen different shades, which would be stored in binary form using 4 bits. But today grayscale images (such as photographs) intended for visual display (both on screen and printed) are commonly stored with 8 bits per sampled pixel. This pixel depth allows 256 different intensities (i.e., shades of gray) to be recorded, and also simplifies computation as each pixel sample can be accessed individually as one full byte. However, if these intensities were spaced equally in proportion to the amount of physical light they represent at that pixel (called a linear encoding or scale), the differences between adjacent dark shades could be quite noticeable as banding artifacts, while many of the lighter shades would be "wasted" by encoding a lot of perceptually-indistinguishable increments. Therefore, the shades are instead typically spread out evenly on a gamma-compressed nonlinear scale, which better approximates uniform perceptual increments for both dark and light shades, usually making these 256 shades enough (just barely) to avoid noticeable increments.
Technical uses (e.g. in medical imaging or remote sensing applications) often require more levels, to make full use of the sensor accuracy (typically 10 or 12 bits per sample) and to reduce rounding errors in computations. Sixteen bits per sample (65,536 levels) is often a convenient choice for such uses, as computers manage 16-bit words efficiently. The TIFF and PNG (among other) image file formats support 16-bit grayscale natively, although browsers and many imaging programs tend to ignore the low order 8 bits of each pixel. Internally for computation and working storage, image processing software typically uses integer or floating-point numbers of size 16 or 32 bits.
Conversion of an arbitrary color image to grayscale is not unique in general; different weighting of the color channels effectively represent the effect of shooting black-and-white film with different-colored photographic filters on the cameras.
Colorimetric (perceptual luminance-preserving) conversion to grayscale
A common strategy is to use the principles of photometry or, more broadly, colorimetry to calculate the grayscale values (in the target grayscale colorspace) so as to have the same luminance (technically relative luminance) as the original color image (according to its colorspace).[2][3] In addition to the same (relative) luminance, this method also ensures that both images will have the sam
A common strategy is to use the principles of photometry or, more broadly, colorimetry to calculate the grayscale values (in the target grayscale colorspace) so as to have the same luminance (technically relative luminance) as the original color image (according to its colorspace).[2][3] In addition to the same (relative) luminance, this method also ensures that both images will have the same absolute luminance when displayed, as can be measured by instruments in its SI units of candelas per square meter, in any given area of the image, given equal whitepoints. Luminance itself is defined using a standard model of human vision, so preserving the luminance in the grayscale image also preserves other perceptual lightness measures, such as L* (as in the 1976 CIE Lab color space) which is determined by the linear luminance Y itself (as in the CIE 1931 XYZ color space) which we will refer to here as Ylinear to avoid any ambiguity.
To convert a color from a colorspace based on a typical gamma-compressed (nonlinear) RGB color model to a grayscale representation of its luminance, the gamma compression function must first be removed via gamma expansion (linearization) to transform the image to a linear RGB colorspace, so that the appropriate weighted sum can be applied to the linear color components (To convert a color from a colorspace based on a typical gamma-compressed (nonlinear) RGB color model to a grayscale representation of its luminance, the gamma compression function must first be removed via gamma expansion (linearization) to transform the image to a linear RGB colorspace, so that the appropriate weighted sum can be applied to the linear color components (
) to calculate the linear luminance Ylinear, which can then be gamma-compressed back again if the grayscale result is also to be encoded and stored in a typical nonlinear colorspace.[4]
For the common sRGB color space, gamma expansion is defined as
where Csrgb represents any of the three gamma-compressed sRGB primaries (Rsrgb, Gsrgb, and Bsrgb, each in range [0,1]) and Clinear is the corresponding linear-intensity value (Rlinear, Glinear, and Blinear, also in range [0,1]). Then, linear luminance is calculated as a weighted sum of the three linear-intensity values. The sRGB color space is defined in terms of the CIE 1931 linear luminance Ylinear, which is given by
.[5]
These three particular coefficients represent the intensity (luminance) perception of typical trichromat humans to light of the precise Rec. 709 additive primary colors (chromaticities) that are used in the definition of sRGB. Human vision is most sensitive to green, so this has the greatest coefficient value (0.7152), and least sensitive to blue, so this has the smallest coefficient (0.0722). To encode grayscale intensity in linear RGB, each of the three color components can be set to equal the calculated linear luminance
(replacing
humans to light of the precise