HOME

TheInfoList



OR:

HTTP compression is a capability that can be built into
web server A web server is computer software and underlying Computer hardware, hardware that accepts requests via Hypertext Transfer Protocol, HTTP (the network protocol created to distribute web content) or its secure variant HTTPS. A user agent, co ...
s and web clients to improve transfer speed and bandwidth utilization. HTTP data is compressed before it is sent from the server: compliant browsers will announce what methods are supported to the server before downloading the correct format; browsers that do not support compliant compression method will download uncompressed data. The most common compression schemes include
gzip gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and ...
and Brotli; a full list of available schemes is maintained by the
IANA The Internet Assigned Numbers Authority (IANA) is a standards organization that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System (DNS), media types, and other Internet P ...
. There are two different ways compression can be done in HTTP. At a lower level, a Transfer-Encoding header field may indicate the payload of an HTTP message is compressed. At a higher level, a Content-Encoding header field may indicate that a resource being transferred, cached, or otherwise referenced is compressed. Compression using Content-Encoding is more widely supported than Transfer-Encoding, and some browsers do not advertise support for Transfer-Encoding compression to avoid triggering bugs in servers.


Compression scheme negotiation

The negotiation is done in two steps, described in RFC 2616 and RFC 9110: 1. The web client advertises which compression schemes it supports by including a list of tokens in the
HTTP request HTTP (Hypertext Transfer Protocol) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, wher ...
. For ''Content-Encoding'', the list is in a field called ''Accept-Encoding''; for ''Transfer-Encoding'', the field is called ''TE''. GET /encrypted-area HTTP/1.1 Host: www.example.com Accept-Encoding: gzip, deflate 2. If the server supports one or more compression schemes, the outgoing data may be compressed by one or more methods supported by both parties. If this is the case, the server will add a ''Content-Encoding'' or ''Transfer-Encoding'' field in the HTTP response with the used schemes, separated by commas. HTTP/1.1 200 OK Date: mon, 26 June 2016 22:38:34 GMT Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux) Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT Accept-Ranges: bytes Content-Length: 438 Connection: close Content-Type: text/html; charset=UTF-8 Content-Encoding: gzip The
web server A web server is computer software and underlying Computer hardware, hardware that accepts requests via Hypertext Transfer Protocol, HTTP (the network protocol created to distribute web content) or its secure variant HTTPS. A user agent, co ...
is by no means obligated to use any compression method – this depends on the internal settings of the web server and also may depend on the internal architecture of the website in question.


Content-Encoding tokens

The official list of tokens available to servers and client is maintained by IANA, and it includes: *br – Brotli, a compression algorithm specifically designed for HTTP content encoding, defined in and implemented in all modern major browsers. * compress – UNIX "compress" program method (historic; deprecated in most applications and replaced by gzip or deflate) *deflate – compression based on the deflate algorithm (described in ), a combination of the
LZ77 LZ77 and LZ78 are the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as Lempel-Ziv 1 (LZ1) and Lempel-Ziv 2 (LZ2) respectively. These two algorithms form the basis ...
algorithm and Huffman coding, wrapped inside the
zlib zlib ( or "zeta-lib", ) is a software library used for data compression as well as a data format. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compre ...
data format (); *exi – W3C Efficient XML Interchange *
gzip gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and ...
– GNU zip format (described in ). Uses the deflate algorithm for compression, but the data format and the checksum algorithm differ from the "deflate" content-encoding. This method is the most broadly supported as of March 2011. * identity – No transformation is used. This is the default value for content coding. * pack200-gzip – Network Transfer Format for Java Archives * zstd – Zstandard compression, defined in In addition to these, a number of unofficial or non-standardized tokens are used in the wild by either servers or clients: *
bzip2 bzip2 is a free and open-source file compression program that uses the Burrows–Wheeler algorithm. It only compresses single files and is not a file archiver. It relies on separate external utilities such as tar for tasks such as handli ...
– compression based on the free bzip2 format, supported by
lighttpd lighttpd (prescribed pronunciation: "lighty") is an open-source web server optimized for speed-critical environments while remaining standards-compliant, secure and flexible. It was originally written by Jan Kneschke as a proof-of-concept of the ...
*
lzip lzip is a free, command-line tool for the compression of data; it employs the Lempel–Ziv–Markov chain algorithm (LZMA) with a user interface that is familiar to users of usual Unix compression tools, such as gzip and bzip2. Like gzip and ...
– compression based on the free lzip format, supported by wget * lzma – compression based on (raw) LZMA is available in Opera 20, and in elinks via a compile-time option *peerdist – Microsoft Peer Content Caching and Retrieval * rsyncdelta encoding in HTTP, implemented by a pair of ''rproxy'' proxies. *xpress – Microsoft compression protocol used by Windows 8 and later for Windows Store application updates.
LZ77 LZ77 and LZ78 are the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as Lempel-Ziv 1 (LZ1) and Lempel-Ziv 2 (LZ2) respectively. These two algorithms form the basis ...
-based compression optionally using a Huffman encoding. * xz – LZMA2-based content compression, supported by a non-official Firefox patch; and fully implemented in mget since 2013-12-31.


Servers that support HTTP compression

*
SAP NetWeaver SAP NetWeaver is a software stack for many of SAP SE's applications. The SAP NetWeaver Application Server, sometimes referred to as WebAS, is the runtime environment for the SAP applications and all of the mySAP Business Suite runs on SAP WebA ...
* Microsoft IIS: built-in or using third-party module *
Apache HTTP Server The Apache HTTP Server ( ) is a free and open-source software, free and open-source cross-platform web server, released under the terms of Apache License, Apache License 2.0. It is developed and maintained by a community of developers under the ...
, via '
mod_deflate
'' (despite its name, only supporting gzip), and '

'' * Hiawatha HTTP server: serves pre-compressed files * Cherokee HTTP server, On the fly gzip and deflate compressions *
Oracle iPlanet Web Server An oracle is a person or thing considered to provide insight, wise counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. If done through occultic means, it is a form of divination. Descript ...
* Zeus Web Server *
lighttpd lighttpd (prescribed pronunciation: "lighty") is an open-source web server optimized for speed-critical environments while remaining standards-compliant, secure and flexible. It was originally written by Jan Kneschke as a proof-of-concept of the ...
*
nginx (pronounced "engine x" , stylized as NGINX or nginx) is a web server that can also be used as a reverse proxy, load balancer, mail proxy and HTTP cache. The software was created by Russian developer Igor Sysoev and publicly released in 20 ...
– built-in *Applications based on
Tornado A tornado is a violently rotating column of air that is in contact with the surface of Earth and a cumulonimbus cloud or, in rare cases, the base of a cumulus cloud. It is often referred to as a twister, whirlwind or cyclone, although the ...
, if "compress_response" is set to True in the application settings (for versions prior to 4.0, set "gzip" to True) * Jetty Server – built-into default static content serving and available via servlet filter configurations * GeoServer * Apache Tomcat *
IBM Websphere IBM WebSphere refers to a brand of proprietary computer software products in the genre of enterprise software known as "application and integration middleware". These software products are used by end-users to create and integrate applications ...
* AOLserver *
Ruby Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
Rack, via the Rack::Deflater middleware * HAProxy *
Varnish Varnish is a clear Transparency (optics), transparent hard protective coating or film. It is not to be confused with wood stain. It usually has a yellowish shade due to the manufacturing process and materials used, but it may also be pigmente ...
– built-in. Works also with ESI
Armeria
– Serving pre-compressed files * NaviServer – built-in, dynamic and static compression * Caddy – built-in vi
encode
Many
content delivery network A content delivery network (CDN) or content distribution network is a geographically distributed network of proxy servers and their data centers. The goal is to provide high availability and performance ("speed") by distributing the service spat ...
s also implement HTTP compression to improve speedy delivery of resources to end users. The compression in HTTP can also be achieved by using the functionality of server-side scripting languages like PHP, or programming languages like
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
. Various online tools exist to verify a working implementation of HTTP compression. These online tools usually request multiple variants of a URL, each with different request headers (with varying Accept-Encoding content). HTTP compression is considered to be implemented correctly when the server returns a document in a compressed format. By comparing the sizes of the returned documents, the effective compression ratio can be calculated (even between different compression algorithms).


Problems preventing the use of HTTP compression

A 2009 article by Google engineers Arvind Jain and Jason Glasgow states that more than 99 person-years are wasted daily due to increase in page load time when users do not receive compressed content. This occurs when anti-virus software interferes with connections to force them to be uncompressed, where proxies are used (with overcautious web browsers), where servers are misconfigured, and where browser bugs stop compression being used. Internet Explorer 6, which drops to HTTP 1.0 (without features like compression or pipelining) when behind a proxy – a common configuration in corporate environments – was the mainstream browser most prone to failing back to uncompressed HTTP. Another problem found while deploying HTTP compression on large scale is due to the deflate encoding definition: while HTTP 1.1 defines the deflate encoding as data compressed with deflate (RFC 1951) inside a
zlib zlib ( or "zeta-lib", ) is a software library used for data compression as well as a data format. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compre ...
formatted stream (RFC 1950), Microsoft server and client products historically implemented it as a "raw" deflated stream, making its deployment unreliable. For this reason, some software, including the Apache HTTP Server, only implements gzip encoding.


Security implications

Compression allows a form of
chosen plaintext Chosen or The Chosen may refer to: Books *The Chosen (Potok novel), ''The Chosen'' (Potok novel), a 1967 novel by Chaim Potok * ''The Chosen'', a 1997 novel by L. J. Smith (author), L. J. Smith *The Chosen (Pinto novel), ''The Chosen'' (Pinto nov ...
attack to be performed: if an attacker can inject any chosen content into the page, they can know whether the page contains their given content by observing the size increase of the encrypted stream. If the increase is smaller than expected for random injections, it means that the compressor has found a repeat in the text, i.e. the injected content overlaps the secret information. This is the idea behind CRIME. In 2012, a general attack against the use of data compression, called
CRIME In ordinary language, a crime is an unlawful act punishable by a State (polity), state or other authority. The term ''crime'' does not, in modern criminal law, have any simple and universally accepted definition,Farmer, Lindsay: "Crime, definiti ...
, was announced. While the CRIME attack could work effectively against a large number of protocols, including but not limited to TLS, and application-layer protocols such as SPDY or HTTP, only exploits against TLS and SPDY were demonstrated and largely mitigated in browsers and servers. The CRIME exploit against HTTP compression has not been mitigated at all, even though the authors of CRIME have warned that this vulnerability might be even more widespread than SPDY and TLS compression combined. In 2013, a new instance of the CRIME attack against HTTP compression, dubbed BREACH, was published. A BREACH attack can extract login tokens, email addresses or other sensitive information from TLS encrypted web traffic in as little as 30 seconds (depending on the number of bytes to be extracted), provided the attacker tricks the victim into visiting a malicious web link. All versions of TLS and SSL are at risk from BREACH regardless of the encryption algorithm or cipher used. Unlike previous instances of
CRIME In ordinary language, a crime is an unlawful act punishable by a State (polity), state or other authority. The term ''crime'' does not, in modern criminal law, have any simple and universally accepted definition,Farmer, Lindsay: "Crime, definiti ...
, which can be successfully defended against by turning off TLS compression or SPDY header compression, BREACH exploits HTTP compression which cannot realistically be turned off, as virtually all web servers rely upon it to improve data transmission speeds for users. As of 2016, the TIME attack and the HEIST attack are now public knowledge.


References


External links

*: Hypertext Transfer Protocol – HTTP/1.1 *: HTTP Semantics
HTTP Content-Coding Values
by Internet Assigned Numbers Authority
Compression with lighttpd
*
Using HTTP Compression
{{Webarchive, url=https://web.archive.org/web/20160314155152/http://www.serverwatch.com/tutorials/article.php/3514866 , date=2016-03-14 by Martin Brown of Server Watch
Using HTTP Compression in PHPDynamic and static HTTP compression with Apache httpd
Web development Lossless compression algorithms Hypertext Transfer Protocol de:Hypertext Transfer Protocol#HTTP-Kompression