Zstandard is a
lossless
Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statisti ...
data compression
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...
algorithm developed by Yann Collet at
Facebook
Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andre ...
. Zstd is the corresponding
reference implementation
In the software development process, a reference implementation (or, less frequently, sample implementation or model implementation) is a program that implements all requirements from a corresponding specification. The reference implementation ...
in
C, released as
open-source software
Open-source software (OSS) is Software, computer software that is released under a Open-source license, license in which the copyright holder grants users the rights to use, study, change, and Software distribution, distribute the software an ...
on 31 August 2016.
The algorithm was published in 2018 as , which also defines an associated
media type
In information and communications technology, a media type, content type or MIME type is a two-part identifier for file formats and content formats. Their purpose is comparable to filename extensions and uniform type identifiers, in that they ide ...
"application/zstd",
filename extension
A filename extension, file name extension or file extension is a suffix to the name of a computer file (for example, .txt, .mp3, .exe) that indicates a characteristic of the file contents or its intended use. A filename extension is typically d ...
"zst", and
HTTP content encoding "zstd".
[
]
Features
Zstandard was designed to give a
compression ratio
The compression ratio is the ratio between the maximum and minimum volume during the compression stage of the power cycle in a piston or Wankel engine.
A fundamental specification for such engines, it can be measured in two different ways. Th ...
comparable to that of the
DEFLATE algorithm (developed in 1991 and used in the original
ZIP and
gzip
gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and ...
programs), but faster, especially for decompression. It is tunable with compression levels ranging from negative 7 (fastest) to 22 (slowest in compression speed, but best compression ratio).
Starting from version 1.3.2 (October 2017), zstd optionally implements very-long-range search and deduplication (, 128 MiB window) similar to
rzip
rzip is a huge-scale data compression computer program designed around initial LZ77-style string matching on a 900 MB dictionary window, followed by bzip2-based Burrows–Wheeler transform and entropy coding ( Huffman) on 900 kB output ...
or
lrzip
rzip is a huge-scale data compression computer program designed around initial LZ77-style string matching on a 900 MB dictionary window, followed by bzip2-based Burrows–Wheeler transform and entropy coding (Huffman coding, Huffman) on 900 ...
.
Compression speed can vary by a factor of 20 or more between the fastest and slowest levels, while decompression is uniformly fast, varying by less than 20% between the fastest and slowest levels.
The Zstandard command-line has an "adaptive" () mode that varies compression level depending on I/O conditions, mainly how fast it can write the output.
''Zstd'' at its maximum compression level gives a compression ratio close to
lzma
The Lempel–Ziv–Markov chain algorithm (LZMA) is an algorithm used to perform lossless data compression. It has been used in the 7z format of the 7-Zip archiver since 2001. This algorithm uses a dictionary compression scheme somewhat similar ...
,
lzham, and
ppmx, and Zstandard reaches the current
Pareto frontier, as it decompresses faster than any other currently available algorithm with similar or better compression ratio.
Dictionaries can have a large impact on the compression ratio of small files, so Zstandard can use a user-provided compression dictionary. It also offers a training mode, able to generate a dictionary from a set of samples. In particular, one dictionary can be loaded to process large sets of files with redundancy between files, but not necessarily within each file, such as for
log files
In computing, logging is the act of keeping a log of events that occur in a computer system, such as problems, errors or broad information on current operations. These events may occur in the operating system or in other software. A message or ' ...
.
Design
Zstandard combines a dictionary-matching stage (
LZ77
LZ77 and LZ78 are the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978.
They are also known as Lempel-Ziv 1 (LZ1) and Lempel-Ziv 2 (LZ2) respectively. These two algorithms form the basis ...
) with a large search window and a fast
entropy-coding stage. It uses both
Huffman coding
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by ...
(used for entries in the Literals section) and finite-state entropy (FSE) – a fast tabled version of ANS,
tANS, used for entries in the Sequences section. Because of the way that FSE carries over state between symbols, decompression involves processing symbols within the Sequences section of each block in reverse order (from last to first).
Usage
The
Linux kernel
The Linux kernel is a Free and open-source software, free and open source Unix-like kernel (operating system), kernel that is used in many computer systems worldwide. The kernel was created by Linus Torvalds in 1991 and was soon adopted as the k ...
has included Zstandard since November 2017 (version 4.14) as a compression method for the
btrfs
Btrfs (pronounced as "better F S", "butter F S", "b-tree F S", or "B.T.R.F.S.") is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager (distinct from Linux's LVM), d ...
and
squashfs
Squashfs is a compressed read-only file system for Linux. Squashfs compresses files, inodes and directories, and supports block sizes from 4 KiB up to 1 MiB for greater compression. Several compression algorithms are supported. Squashfs is ...
filesystems.
In 2017, Allan Jude integrated Zstandard into the
FreeBSD
FreeBSD is a free-software Unix-like operating system descended from the Berkeley Software Distribution (BSD). The first version was released in 1993 developed from 386BSD, one of the first fully functional and free Unix clones on affordable ...
kernel, and it was subsequently integrated as a compressor option for core dumps (both user programs and kernel panics). It was also used to create a proof-of-concept
OpenZFS
OpenZFS is an open-source implementation of the ZFS file system and volume manager initially developed by Sun Microsystems for the Solaris operating system, and is now maintained by the OpenZFS Project. Similar to the original ZFS, the impleme ...
compression method
which was integrated in 2020.
The
AWS Redshift and
RocksDB
RocksDB is a high performance embedded database for key-value data. It is a fork of Google's LevelDB optimized to exploit multi-core processors (CPUs), and make efficient use of fast storage, such as solid-state drives (SSD), for input/outpu ...
databases include support for field compression using Zstandard.
In March 2018,
Canonical
The adjective canonical is applied in many contexts to mean 'according to the canon' the standard, rule or primary source that is accepted as authoritative for the body of knowledge or literature in that context. In mathematics, ''canonical exampl ...
tested
the use of zstd as a
deb package compression method by default for the
Ubuntu
Ubuntu ( ) is a Linux distribution based on Debian and composed primarily of free and open-source software. Developed by the British company Canonical (company), Canonical and a community of contributors under a Meritocracy, meritocratic gover ...
Linux distribution. Compared with
xz compression of deb packages, zstd at level 19 decompresses significantly faster, but at the cost of 6% larger package files. Support was added to Debian (and subsequently, Ubuntu) in April 2018 (in version 1.6~rc1).
Fedora
A fedora () is a hat with a soft brim and indented crown.Kilgour, Ruth Edwards (1958). ''A Pageant of Hats Ancient and Modern''. R. M. McBride Company. It is typically creased lengthwise down the crown and "pinched" near the front on both sides ...
added ZStandard support to
RPM
Revolutions per minute (abbreviated rpm, RPM, rev/min, r/min, or r⋅min−1) is a unit of rotational speed (or rotational frequency) for rotating machines.
One revolution per minute is equivalent to hertz.
Standards
ISO 80000-3:2019 def ...
in May 2018 (Fedora release 28) and used it for packaging the release in October 2019 (Fedora 31). In Fedora 33, the filesystem is compressed by default with zstd.
Arch Linux
Arch Linux () is an Open-source software, open source, rolling release Linux distribution. Arch Linux is kept up-to-date by regularly updating the individual pieces of software that it comprises. Arch Linux is intentionally minimal, and is meant ...
added support for zstd as a package compression method in October 2019 with the release of the
pacman 5.2 package manager and in January 2020 switched from xz to zstd for the packages in the official repository. Arch uses
zstd -c -T0 --ultra -20 -
; the size of all compressed packages combined increased by 0.8% (compared to xz), the decompression speed is 14 times faster, decompression memory increased by 50 MiB when using multiple threads, and compression memory increased but scales with the number of threads used. Arch Linux later also switched to zstd as the default compression algorithm for mkinitcpio
initial ramdisk
In Linux systems, initrd (''initial ramdisk'') is a scheme for loading a temporary root file system into computer memory, memory, to be used as part of the Linux startup process. initrd and initramfs (from INITial RAM File System) refer to two dif ...
generator.
A full implementation of the algorithm with an option to choose the compression level is used in the .NSZ/.XCZ file formats developed by the
homebrew community for the
Nintendo Switch
The is a video game console developed by Nintendo and released worldwide in most regions on March 3, 2017. Released in the middle of the Eighth generation of video game consoles, eighth generation of home consoles, the Switch succeeded the ...
hybrid game console. It is also one of many supported compression algorithms in the .RVZ
Wii
The Wii ( ) is a home video game console developed and marketed by Nintendo. It was released on November 19, 2006, in North America, and in December 2006 for most other regions of the world. It is Nintendo's fifth major home game console, f ...
and
GameCube
The is a PowerPC-based home video game console developed and marketed by Nintendo. It was released in Japan on September 14, 2001, in North America on November 18, 2001, in Europe on May 3, 2002, and in Australia on May 17, 2002. It is the suc ...
disc image file format.
On 15 June 2020, Zstandard was implemented in version 6.3.8 of the zip file format with codec number 93, deprecating the previous codec number of 20 as it was implemented in version 6.3.7, released on 1 June.
In March 2024,
Google Chrome
Google Chrome is a web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS, iOS, iPadOS, an ...
version 123 (and
Chromium
Chromium is a chemical element; it has Symbol (chemistry), symbol Cr and atomic number 24. It is the first element in Group 6 element, group 6. It is a steely-grey, Luster (mineralogy), lustrous, hard, and brittle transition metal.
Chromium ...
-based browsers such as
Brave or
Microsoft Edge
Microsoft Edge is a Proprietary Software, proprietary cross-platform software, cross-platform web browser created by Microsoft and based on the Chromium (web browser), Chromium open-source project, superseding Edge Legacy. In Windows 11, Edge ...
) added zstd support in the
HTTP header
HTTP header fields are a list of strings sent and received by both the client program and server on every HTTP request and response. These headers are usually invisible to the end-user and are only processed or logged by the server and client ...
Content-Encoding
. In May 2024,
Firefox
Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements curr ...
release 126.0 added zstd support in the
HTTP header
HTTP header fields are a list of strings sent and received by both the client program and server on every HTTP request and response. These headers are usually invisible to the end-user and are only processed or logged by the server and client ...
Content-Encoding
.
License
The reference implementation is licensed under the
BSD license
BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD lic ...
, published at
GitHub
GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...
. Since version 1.0, published 31 August 2016, it had an additional Grant of Patent Rights.
From version 1.3.1, released 20 August 2017, this patent grant was dropped and the license was changed to a BSD + GPLv2 dual license.
See also
*
LZ4 (compression algorithm)
LZ4 is a lossless data compression algorithm that is focused on compression and decompression speed. It belongs to the LZ77 family of byte-oriented compression schemes.
Features
The LZ4 algorithm aims to provide a good trade-off between speed ...
– a fast member of the LZ77 family
*
LZFSE
LZFSE (Lempel–Ziv Finite State Entropy) is an open source lossless data compression algorithm created by Apple Inc. It was released with a simpler algorithm called LZVN.
Overview
The name is an acronym for Lempel–Ziv and finite-state ent ...
– a similar algorithm by Apple used since iOS 9 and OS X 10.11 and made open source on 1 June 2016
*
Zlib
zlib ( or "zeta-lib", ) is a software library used for data compression as well as a data format. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compre ...
*
Brotli
Brotli is a lossless data compression algorithm developed by Jyrki Alakuijala and Zoltán Szabadka. It uses a combination of the general-purpose LZ77 lossless compression algorithm, Huffman coding and 2nd-order context modelling.
Brotli is pr ...
– also integrated into browsers
References
External links
*
*
*
*
Smaller and faster data compression with Zstandard, Yann Collet and Chip Turner, 31 August 2016, Facebook Announcement
The Guardian is using ZStandard instead of zlib
{{Archive formats
2016 software
C (programming language) libraries
Free data compression software
Lossless compression algorithms
Software using the BSD license
Data compression