information technology Information technology (IT) is a set of related fields within information and communications technology (ICT), that encompass computer systems, software, programming languages, data processing, data and information processing, and storage. Inf ...

, lossy compression or irreversible compression is the class of

data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...

methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size for storing, handling, and transmitting content. Higher degrees of approximation create coarser images as more details are removed. This is opposed to

lossless data compression Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits Redundanc ...

(reversible data compression) which does not degrade the data. The amount of data reduction possible using lossy compression is much higher than using lossless techniques. Well-designed lossy compression technology often reduces file sizes significantly before degradation is noticed by the end-user. Even when noticeable by the user, further data reduction may be desirable (e.g., for real-time communication or to reduce transmission times or storage needs). The most widely used lossy compression algorithm is the

discrete cosine transform A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequency, frequencies. The DCT, first proposed by Nasir Ahmed (engineer), Nasir Ahmed in 1972, is a widely ...

(DCT), first published by Nasir Ahmed, T. Natarajan and K. R. Rao in 1974. Lossy compression is most commonly used to compress

multimedia Multimedia is a form of communication that uses a combination of different content forms, such as Text (literary theory), writing, Sound, audio, images, animations, or video, into a single presentation. T ...

data ( audio,

video Video is an Electronics, electronic medium for the recording, copying, playback, broadcasting, and display of moving picture, moving image, visual Media (communication), media. Video was first developed for mechanical television systems, whi ...

, and

image An image or picture is a visual representation. An image can be Two-dimensional space, two-dimensional, such as a drawing, painting, or photograph, or Three-dimensional space, three-dimensional, such as a carving or sculpture. Images may be di ...

s), especially in applications such as

streaming media Streaming media refers to multimedia delivered through a Computer network, network for playback using a Media player (disambiguation), media player. Media is transferred in a ''stream'' of Network packet, packets from a Server (computing), ...

and

internet telephony Voice over Internet Protocol (VoIP), also known as IP telephony, is a set of technologies used primarily for voice communication sessions over Internet Protocol (IP) networks, such as the Internet. VoIP enables voice calls to be transmitted as ...

. By contrast, lossless compression is typically required for text and data files, such as bank records and text articles. It can be advantageous to make a master lossless file which can then be used to produce additional copies from. This allows one to avoid basing new compressed copies on a lossy source file, which would yield additional artifacts and further unnecessary information loss.

Types

It is possible to compress many types of digital data in a way that reduces the size of a

computer file A computer file is a System resource, resource for recording Data (computing), data on a Computer data storage, computer storage device, primarily identified by its filename. Just as words can be written on paper, so too can data be written to a ...

needed to store it, or the bandwidth needed to transmit it, with no loss of the full information contained in the original file. A picture, for example, is converted to a digital file by considering it to be an array of dots and specifying the color and brightness of each dot. If the picture contains an area of the same color, it can be compressed without loss by saying "200 red dots" instead of "red dot, red dot, ...(197 more times)..., red dot." The original data contains a certain amount of information, and there is a lower bound to the size of a file that can still carry all the information. Basic

information theory Information theory is the mathematical study of the quantification (science), quantification, Data storage, storage, and telecommunications, communication of information. The field was established and formalized by Claude Shannon in the 1940s, ...

says that there is an absolute limit in reducing the size of this data. When data is compressed, its

entropy Entropy is a scientific concept, most commonly associated with states of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the micros ...

increases, and it cannot increase indefinitely. For example, a compressed ZIP file is smaller than its original, but repeatedly compressing the same file will not reduce the size to nothing. Most compression algorithms can recognize when further compression would be pointless and would in fact increase the size of the data. In many cases, files or data streams contain more information than is needed. For example, a picture may have more detail than the eye can distinguish when reproduced at the largest size intended; likewise, an audio file does not need a lot of fine detail during a very loud passage. Developing lossy compression techniques as closely matched to human perception as possible is a complex task. Sometimes the ideal is a file that provides exactly the same perception as the original, with as much digital information as possible removed; other times, perceptible loss of quality is considered a valid tradeoff. The terms "irreversible" and "reversible" are preferred over "lossy" and "lossless" respectively for some applications, such as medical image compression, to circumvent the negative implications of "loss". The type and amount of loss can affect the utility of the images. Artifacts or undesirable effects of compression may be clearly discernible yet the result still useful for the intended purpose. Or lossy compressed images may be ' visually lossless', or in the case of medical images, so-called diagnostically acceptable irreversible compression (DAIC) may have been applied.

Transform coding

Some forms of lossy compression can be thought of as an application of transform coding, which is a type of data compression used for digital images,

digital audio Digital audio is a representation of sound recorded in, or converted into, digital signal (signal processing), digital form. In digital audio, the sound wave of the audio signal is typically encoded as numerical sampling (signal processing), ...

signals, and

digital video Digital video is an electronic representation of moving visual images (video) in the form of encoded digital data. This is in contrast to analog video, which represents moving visual images in the form of analog signals. Digital video comprises ...

. The transformation is typically used to enable better (more targeted) quantization. Knowledge of the application is used to choose information to discard, thereby lowering its bandwidth. The remaining information can then be compressed via a variety of methods. When the output is decoded, the result may not be identical to the original input, but is expected to be close enough for the purpose of the application. The most common form of lossy compression is a transform coding method, the

(DCT), which was first published by Nasir Ahmed, T. Natarajan and K. R. Rao in 1974. DCT is the most widely used form of lossy compression, for popular

image compression Image compression is a type of data compression applied to digital images, to reduce their cost for computer data storage, storage or data transmission, transmission. Algorithms may take advantage of visual perception and the statistical properti ...

formats (such as

JPEG JPEG ( , short for Joint Photographic Experts Group and sometimes retroactively referred to as JPEG 1) is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degr ...

), video coding standards (such as

MPEG The Moving Picture Experts Group (MPEG) is an alliance of working groups established jointly by International Organization for Standardization, ISO and International Electrotechnical Commission, IEC that sets standards for media coding, includ ...

and H.264/AVC) and audio compression formats (such as MP3 and AAC). In the case of audio data, a popular form of transform coding is perceptual coding, which transforms the raw data to a domain that more accurately reflects the information content. For example, rather than expressing a sound file as the amplitude levels over time, one may express it as the frequency spectrum over time, which corresponds more accurately to human audio perception. While data reduction (compression, be it lossy or lossless) is a main goal of transform coding, it also allows other goals: one may represent data more accurately for the original amount of space – for example, in principle, if one starts with an analog or high-resolution digital master, an MP3 file of a given size should provide a better representation than a raw uncompressed audio in WAV or AIFF file of the same size. This is because uncompressed audio can only reduce file size by lowering bit rate or depth, whereas compressing audio can reduce size while maintaining bit rate and depth. This compression becomes a selective loss of the least significant data, rather than losing data across the board. Further, a transform coding may provide a better domain for manipulating or otherwise editing the data – for example, equalization of audio is most naturally expressed in the frequency domain (boost the bass, for instance) rather than in the raw time domain. From this point of view, perceptual encoding is not essentially about ''discarding'' data, but rather about a ''better representation'' of data. Another use is for

backward compatibility In telecommunications and computing, backward compatibility (or backwards compatibility) is a property of an operating system, software, real-world product, or technology that allows for interoperability with an older legacy system, or with Input ...

and graceful degradation: in color television, encoding color via a

luminance Luminance is a photometric measure of the luminous intensity per unit area of light travelling in a given direction. It describes the amount of light that passes through, is emitted from, or is reflected from a particular area, and falls wit ...

- chrominance transform domain (such as YUV) means that black-and-white sets display the luminance, while ignoring the color information. Another example is

chroma subsampling Chroma subsampling is the practice of encoding images by implementing less resolution for Chrominance, chroma information than for luma (video), luma information, taking advantage of the human visual system's lower acuity for color differences t ...

: the use of

color space A color space is a specific organization of colors. In combination with color profiling supported by various physical devices, it supports reproducible representations of colorwhether such representation entails an analog or a digital represe ...

s such as

YIQ YIQ is the color space used by the analog NTSC color TV system. ''I'' stands for ''in-phase'', while ''Q'' stands for ''quadrature'', referring to the components used in quadrature amplitude modulation. Other TV systems used different color spa ...

, used in

NTSC NTSC (from National Television System Committee) is the first American standard for analog television, published and adopted in 1941. In 1961, it was assigned the designation System M. It is also known as EIA standard 170. In 1953, a second ...

, allow one to reduce the resolution on the components to accord with human perception – humans have highest resolution for black-and-white (luma), lower resolution for mid-spectrum colors like yellow and green, and lowest for red and blues – thus NTSC displays approximately 350 pixels of luma per scanline, 150 pixels of yellow vs. green, and 50 pixels of blue vs. red, which are proportional to human sensitivity to each component.

Information loss

Lossy compression formats suffer from generation loss: repeatedly compressing and decompressing the file will cause it to progressively lose quality. This is in contrast with

, where data will not be lost via the use of such a procedure. Information-theoretical foundations for lossy data compression are provided by rate-distortion theory. Much like the use of

probability Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...

in optimal

coding theory Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and computer data storage, data sto ...

, rate-distortion theory heavily draws on Bayesian

estimation Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is d ...

and

decision theory Decision theory or the theory of rational choice is a branch of probability theory, probability, economics, and analytic philosophy that uses expected utility and probabilities, probability to model how individuals would behave Rationality, ratio ...

in order to model perceptual distortion and even

aesthetic Aesthetics (also spelled esthetics) is the branch of philosophy concerned with the nature of beauty and taste, which in a broad sense incorporates the philosophy of art.Slater, B. H.Aesthetics ''Internet Encyclopedia of Philosophy,'' , acces ...

judgment. There are two basic lossy compression schemes: * In ''lossy transform

codec A codec is a computer hardware or software component that encodes or decodes a data stream or signal. ''Codec'' is a portmanteau of coder/decoder. In electronic communications, an endec is a device that acts as both an encoder and a decoder o ...

s'', samples of picture or sound are taken, chopped into small segments, transformed into a new basis space, and quantized. The resulting quantized values are then entropy coded. * In ''lossy predictive codecs'', previous and/or subsequent decoded data is used to predict the current sound sample or image frame. The error between the predicted data and the real data, together with any extra information needed to reproduce the prediction, is then quantized and coded. In some systems the two techniques are combined, with transform codecs being used to compress the error signals generated by the predictive stage.

Comparison

The advantage of lossy methods over lossless methods is that in some cases a lossy method can produce a much smaller compressed file than any lossless method, while still meeting the requirements of the application. Lossy methods are most often used for compressing sound, images or videos. This is because these types of data are intended for human interpretation where the mind can easily "fill in the blanks" or see past very minor errors or inconsistencies – ideally lossy compression is transparent (imperceptible), which can be verified via an

ABX test An ABX test is a method of comparing two choices of sensory stimuli to identify detectable differences between them. A subject is presented with two known samples (sample , the first reference, and sample , the second reference) followed by one un ...

. Data files using lossy compression are smaller in size and thus cost less to store and to transmit over the Internet, a crucial consideration for streaming video services such as

Netflix Netflix is an American subscription video on-demand over-the-top streaming service. The service primarily distributes original and acquired films and television shows from various genres, and it is available internationally in multiple lang ...

and streaming audio services such as

Spotify Spotify (; ) is a List of companies of Sweden, Swedish Music streaming service, audio streaming and media service provider founded on 23 April 2006 by Daniel Ek and Martin Lorentzon. , it is one of the largest providers of music streaming services ...

Transparency

When a user acquires a lossily compressed file, (for example, to reduce download time) the retrieved file can be quite different from the original at the bit level while being indistinguishable to the human ear or eye for most practical purposes. Many compression methods focus on the idiosyncrasies of human physiology, taking into account, for instance, that the human eye can see only certain wavelengths of light. The psychoacoustic model describes how sound can be highly compressed without degrading perceived quality. Flaws caused by lossy compression that are noticeable to the human eye or ear are known as

compression artifact A compression artifact (or artefact) is a noticeable distortion of media (including Image, images, Sound recording, audio, and video) caused by the application of lossy compression. Lossy data compression involves discarding some of the medi ...

Compression ratio

The compression ratio (that is, the size of the compressed file compared to that of the uncompressed file) of lossy video codecs is nearly always far superior to that of the audio and still-image equivalents. * Video can be compressed immensely (e.g., 100:1) with little visible quality loss * Audio can often be compressed at 10:1 with almost imperceptible loss of quality * Still images are often lossily compressed at 10:1, as with audio, but the quality loss is more noticeable, especially on closer inspection.

Transcoding and editing

An important caveat about lossy compression (formally transcoding), is that editing lossily compressed files causes digital generation loss from the re-encoding. This can be avoided by only producing lossy files from (lossless) originals and only editing (copies of) original files, such as images in

raw image format A camera raw image file contains unprocessed or minimally processed data from the image sensor of either a digital camera, a motion picture film scanner, or other image scanner. Raw files are so named because they are not yet processed, ...

instead of

. If data which has been compressed lossily is decoded and compressed losslessly, the size of the result can be comparable with the size of the data before lossy compression, but the data already lost cannot be recovered. When deciding to use lossy conversion without keeping the original, format conversion may be needed in the future to achieve compatibility with software or devices ( format shifting), or to avoid paying patent royalties for decoding or distribution of compressed files.

Editing of lossy files

By modifying the compressed data directly without decoding and re-encoding, some editing of lossily compressed files without degradation of quality is possible. Editing which reduces the file size as if it had been compressed to a greater degree, but without more loss than this, is sometimes also possible.

JPEG

The primary programs for lossless editing of JPEGs are jpegtran, and the derived exiftran (which also preserves Exif information), an
Jpegcrop
(which provides a Windows interface). These allow the image to be cropped, rotated, flipped, and flopped, or even converted to

grayscale In digital photography, computer-generated imagery, and colorimetry, a greyscale (more common in Commonwealth English) or grayscale (more common in American English) image is one in which the value of each pixel is a single sample (signal), s ...

(by dropping the chrominance channel). While unwanted information is destroyed, the quality of the remaining portion is unchanged. Some other transforms are possible to some extent, such as joining images with the same encoding (composing side by side, as on a grid) or pasting images such as logos onto existing images (both vi
Jpegjoin
, or scaling. Some changes can be made to the compression without re-encoding: * Optimizing the compression (to reduce size without change to the decoded image) * Converting between progressive and non-progressive encoding. The freeware Windows-only IrfanView has some lossless JPEG operations in its JPG_TRANSFORM plugin.

Metadata

Metadata, such as ID3 tags, Vorbis comments, or Exif information, can usually be modified or removed without modifying the underlying data.

Downsampling/compressed representation scalability

One may wish to downsample or otherwise decrease the resolution of the represented source signal and the quantity of data used for its compressed representation without re-encoding, as in bitrate peeling, but this functionality is not supported in all designs, as not all codecs encode data in a form that allows less important detail to simply be dropped. Some well-known designs that have this capability include

JPEG 2000 JPEG 2000 (JP2) is an image compression standard and coding system. It was developed from 1997 to 2000 by a Joint Photographic Experts Group committee chaired by Touradj Ebrahimi (later the JPEG president), with the intention of superseding their ...

for still images and H.264/MPEG-4 AVC based Scalable Video Coding for video. Such schemes have also been standardized for older designs as well, such as

images with progressive encoding, and MPEG-2 and MPEG-4 Part 2 video, although those prior schemes had limited success in terms of adoption into real-world common usage. Without this capacity, which is often the case in practice, to produce a representation with lower resolution or lower fidelity than a given one, one needs to start with the original source signal and encode, or start with a compressed representation and then decompress and re-encode it ( transcoding), though the latter tends to cause digital generation loss. Another approach is to encode the original signal at several different bitrates, and then either choose which to use (as when streaming over the internet – as in RealNetworks' " SureStream" – or offering varying downloads, as at Apple's iTunes Store), or broadcast several, where the best that is successfully received is used, as in various implementations of hierarchical modulation. Similar techniques are used in mipmaps, pyramid representations, and more sophisticated scale space methods. Some audio formats feature a combination of a lossy format and a lossless correction which when combined reproduce the original signal; the correction can be stripped, leaving a smaller, lossily compressed, file. Such formats include

MPEG-4 SLS MPEG-4 SLS, or MPEG-4 Scalable to Lossless as per International Organization for Standardization, ISO/International Electrotechnical Commission, IEC 14496-3:2005/Amd 3:2006 (Scalable Lossless Coding), is an extension to the MPEG-4 Part 3 (MPEG-4 ...

(Scalable to Lossless), WavPack, OptimFROG DualStream, and DTS-HD Master Audio in lossless (XLL) mode).

Methods

Graphics

Image

Discrete cosine transform A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequency, frequencies. The DCT, first proposed by Nasir Ahmed (engineer), Nasir Ahmed in 1972, is a widely ...

(DCT) **

** WebP (high-density lossless or lossy compression of RGB and RGBA images) ** High Efficiency Image Format (HEIF) ** Better Portable Graphics (BPG) (lossless or lossy compression) ** JPEG XR, a successor of JPEG with support for high-dynamic range, wide

gamut In color reproduction and colorimetry, a gamut, or color gamut , is a convex set containing the colors that can be accurately represented, i.e. reproduced by an output device (e.g. printer or display) or measured by an input device (e.g. cam ...

pixel formats (lossless or lossy compression) * Wavelet compression **

, JPEG's successor format that uses wavelets (lossless or lossy compression) ** DjVu ** ICER (file format), ICER, used by the Mars Rovers, related to

in its use of wavelets ** Progressive Graphics File, PGF, Progressive Graphics File (lossless or lossy compression) * Cartesian Perceptual Compression, also known as CPC * Fractal compression * JBIG2 (lossless or lossy compression) * S3TC Texture mapping, texture compression for GPU, 3D computer graphics hardware

3D computer graphics

* glTF

Video

(DCT) ** H.261 ** Motion JPEG ** MPEG-1 Part 2K. R. Rao and J. J. Hwang, ''Techniques and Standards for Image, Video, and Audio Coding'', Prentice Hall, 1996; JPEG: Chapter 8; H.261: Chapter 9; MPEG-1: Chapter 10; MPEG-2: Chapter 11. ** MPEG-2 Part 2 (H.262) ** MPEG-4 Part 2 (H.263) ** Advanced Video Coding (AVC / H.264 / MPEG-4 AVC) (may also be lossless, even in certain video sections) ** High Efficiency Video Coding (HEVC / H.265) ** Ogg Theora (noted for its lack of patent restrictions) ** VC-1 * Wavelet compression ** Motion JPEG 2000 ** Dirac codec, Dirac * Sorenson codec, Sorenson video codec

Audio

General

* Modified discrete cosine transform (MDCT) ** Dolby Digital (AC-3) ** Adaptive Transform Acoustic Coding (ATRAC) ** MPEG Layer III (MP3) ** Advanced Audio Coding (AAC / MP4 Audio) ** Vorbis ** Windows Media Audio (WMA) (Standard and Pro profiles are lossy. WMA Lossless is also available.) ** LDAC (codec), LDAC ** Opus (codec), Opus (Notable for lack of patent restrictions, low delay, and high quality speech and general audio.) * Adaptive differential pulse-code modulation (ADPCM) ** Master Quality Authenticated (MQA) * MPEG-1 Audio Layer II (MP2) * Musepack (based on Musicam) * AptX, aptX/ aptX-HD

Speech

* Linear predictive coding (LPC) ** Adaptive predictive coding (APC) ** Code-excited linear prediction (CELP) ** Algebraic code-excited linear prediction (ACELP) ** Relaxed code-excited linear prediction (RCELP) ** Low-delay CELP (LD-CELP) ** Adaptive Multi-Rate audio codec, Adaptive Multi-Rate (used in GSM and 3GPP) ** Codec2 (noted for its lack of patent restrictions) ** Speex (noted for its lack of patent restrictions) * Modified discrete cosine transform (MDCT) ** AAC-LD ** Constrained Energy Lapped Transform (CELT) ** Opus (codec), Opus (mostly for real-time applications)

Other data

Researchers have performed lossy compression on text by either using a thesaurus to substitute short words for long ones, or Natural language generation, generative text techniques, although these sometimes fall into the related category of lossy data conversion.

Lowering resolution

A general kind of lossy compression is to lower the resolution of an image, as in image scaling, particularly Decimation (signal processing), decimation. One may also remove less "lower information" parts of an image, such as by seam carving. Many media transforms, such as Gaussian blur, are, like lossy compression, irreversible: the original signal cannot be reconstructed from the transformed signal. However, in general these will have the same size as the original, and are not a form of compression. Lowering resolution has practical uses, as the NASA New Horizons craft transmitted thumbnails of its encounter with Pluto-Charon before it sent the higher resolution images. Another solution for slow connections is the usage of Interlacing (bitmaps), Image interlacing which progressively defines the image. Thus a partial transmission is enough to preview the final image, in a lower resolution version, without creating a scaled and a full version too.

Notes

External links

Lossy audio formats
comparing the speed and compression strength of five lossy audio formats.
Data compression basics
including chapters on lossy compression of images, audio and video. *
Using lossy GIF/PNG compression for the web (article)

comparing the suitability of JPG and lossless compression for image archives
JPG Image Compression
Jpg, Png compressor tool {{DEFAULTSORT:Lossy Compression Data compression Lossy compression algorithms, Lossy compression algorithms fr:Compression de données#Compression avec pertes