Xz File Format
   HOME

TheInfoList



OR:

XZ Utils (previously LZMA Utils) is a set of
free software Free software, libre software, libreware sometimes known as freedom-respecting software is computer software distributed open-source license, under terms that allow users to run the software for any purpose as well as to study, change, distribut ...
command-line A command-line interface (CLI) is a means of interacting with software via commands each formatted as a line of text. Command-line interfaces emerged in the mid-1960s, on computer terminals, as an interactive and more user-friendly alternativ ...
lossless data compressors, including the programs lzma and xz, for
Unix-like A Unix-like (sometimes referred to as UN*X, *nix or *NIX) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Uni ...
operating systems and, from version 5.0 onwards,
Microsoft Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
. For compression/decompression the
Lempel–Ziv–Markov chain algorithm The Lempel–Ziv–Markov chain algorithm (LZMA) is an algorithm used to perform lossless data compression. It has been used in the 7z format of the 7-Zip archiver since 2001. This algorithm uses a Dictionary coder, dictionary compression scheme ...
(LZMA) is used. XZ Utils started as a Unix port of Igor Pavlov's LZMA- SDK that has been adapted to fit seamlessly into Unix environments and their usual structure and behavior.


Features

XZ Utils can compress and decompress the ''xz'' and ''lzma'' file formats. Since the LZMA format has been considered
legacy Legacy or Legacies may refer to: Arts and entertainment Comics * " Batman: Legacy", a 1996 Batman storyline * '' DC Universe: Legacies'', a comic book series from DC Comics * ''Legacy'', a 1999 quarterly series from Antarctic Press * ''Legacy ...
, XZ Utils by default compresses to xz. In addition, decompression of the .lz format used by
lzip lzip is a free, command-line tool for the compression of data; it employs the Lempel–Ziv–Markov chain algorithm (LZMA) with a user interface that is familiar to users of usual Unix compression tools, such as gzip and bzip2. Like gzip and ...
is supported since version 5.3.4. In most cases, xz achieves higher compression rates than alternatives like zip,
gzip gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and ...
and
bzip2 bzip2 is a free and open-source file compression program that uses the Burrows–Wheeler algorithm. It only compresses single files and is not a file archiver. It relies on separate external utilities such as tar for tasks such as handli ...
. Decompression speed is higher than bzip2, but lower than gzip. Compression can be much slower than gzip, and is slower than bzip2 for high levels of compression, and is most useful when a compressed file will be used many times. XZ Utils consists of two major components: * , the command-line compressor and decompressor (analogous to
gzip gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and ...
) * liblzma, a
software library In computing, a library is a collection of resources that can be leveraged during software development to implement a computer program. Commonly, a library consists of executable code such as compiled functions and classes, or a library can ...
with an
API An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
similar to
zlib zlib ( or "zeta-lib", ) is a software library used for data compression as well as a data format. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compre ...
Various command shortcuts exist, such as (for ), (for ; analogous to ) and (for ; analogous to ).


Usage

Both the behavior of the software and the properties of the file format have been designed to work similarly to those of the popular Unix compressing tools
gzip gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and ...
and
bzip2 bzip2 is a free and open-source file compression program that uses the Burrows–Wheeler algorithm. It only compresses single files and is not a file archiver. It relies on separate external utilities such as tar for tasks such as handli ...
. Just like gzip and bzip, xz and lzma can only compress single files (or data streams) as input. They cannot bundle multiple files into a single
archive An archive is an accumulation of historical records or materials, in any medium, or the physical facility in which they are located. Archives contain primary source documents that have accumulated over the course of an individual or organ ...
to do this an archiving program is used first, such as
tar Tar is a dark brown or black viscous liquid of hydrocarbons and free carbon, obtained from a wide variety of organic materials through destructive distillation. Tar can be produced from coal, wood, petroleum, or peat. "a dark brown or black b ...
. Compressing an archive: xz my_archive.tar # results in my_archive.tar.xz lzma my_archive.tar # results in my_archive.tar.lzma Decompressing the archive: unxz my_archive.tar.xz # results in my_archive.tar unlzma my_archive.tar.lzma # results in my_archive.tar Version 1.22 or greater of the
GNU GNU ( ) is an extensive collection of free software (394 packages ), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operating systems popu ...
implementation of tar has transparent support for tarballs compressed with lzma and xz, using the
switches In electrical engineering, a switch is an electrical component that can disconnect or connect the conducting path in an electrical circuit, interrupting the electric current or diverting it from one conductor to another. The most common type o ...
or for xz compression, and for LZMA compression. Creating an archive and compressing it: tar -c --xz -f my_archive.tar.xz /some_directory # results in my_archive.tar.xz tar -c --lzma -f my_archive.tar.lzma /some_directory # results in my_archive.tar.lzma Decompressing the archive and extracting its contents: tar -x --xz -f my_archive.tar.xz # results in /some_directory tar -x --lzma -f my_archive.tar.lzma # results in /some_directory Single-letter tar example for archive with compress and decompress with extract using short suffix: tar cJf keep.txz keep # archive then compress the directory ./keep/ into the file ./keep.txz tar xJf keep.txz # decompress then extract the file ./keep.txz creating the directory ./keep/ xz has supported multi-threaded compression (with the flag) since 2014, version 5.2.0; since version 5.4.0 threaded decompression has been implemented. Threaded decompression requires multiple compressed blocks within a stream which are created by the threaded compression interface. The number of threads can be less than defined if the file is not big enough for threading with the given settings or if using more threads would exceed the memory usage limit.


The xz format

The xz format improves on lzma by allowing for preprocessing filters. The exact filters used are similar to those used in 7z, as 7z's filters are available in the public domain via the LZMA SDK. There are claims that the xz format is inadequate for long-term archiving.


Development and adoption

Development of XZ Utils took place within the Tukaani Project, a small group of developers who once maintained a
Linux distribution A Linux distribution, often abbreviated as distro, is an operating system that includes the Linux kernel for its kernel functionality. Although the name does not imply product distribution per se, a distro—if distributed on its own—is oft ...
based on
Slackware Slackware is a Linux distribution created by Patrick Volkerding in 1993. Originally based on Softlanding Linux System (SLS), Slackware has been the basis for many other Linux distributions, most notably the first versions of SUSE Linux distr ...
. The chosen name "XZ" is not an abbreviation but instead appears to be a random given name for the data compressors, as there is no mention anywhere in the official specification on the meaning of "XZ". The .xz file format specification version 1.0.0 was officially released in January 2009. All of the
source code In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer. Since a computer, at base, only ...
for xz and liblzma has been released into the
public domain The public domain (PD) consists of all the creative work to which no Exclusive exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly Waiver, waived, or may be inapplicable. Because no one holds ...
. The XZ Utils source distribution additionally includes some optional scripts and an example program that are subject to various versions of the
GNU General Public License The GNU General Public Licenses (GNU GPL or simply GPL) are a series of widely used free software licenses, or ''copyleft'' licenses, that guarantee end users the freedom to run, study, share, or modify the software. The GPL was the first ...
(GPL). The resulting software xz and liblzma binaries are public domain, unless the optional LGPL
getopt Getopt is a C library function used to parse command-line options of the Unix/POSIX style. It is a part of the POSIX specification, and is universal to Unix-like systems. It is also the name of a Unix program for parsing command line argument ...
implementation is incorporated. Binaries are available for
FreeBSD FreeBSD is a free-software Unix-like operating system descended from the Berkeley Software Distribution (BSD). The first version was released in 1993 developed from 386BSD, one of the first fully functional and free Unix clones on affordable ...
,
NetBSD NetBSD is a free and open-source Unix-like operating system based on the Berkeley Software Distribution (BSD). It was the first open-source BSD descendant officially released after 386BSD was fork (software development), forked. It continues to ...
,
Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
systems,
Microsoft Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
, and
FreeDOS FreeDOS (formerly PD-DOS) is a free software operating system for IBM PC compatible computers. It intends to provide a complete MS-DOS-compatible environment for running Legacy system, legacy software and supporting embedded systems. FreeDOS ca ...
. A number of
Linux distribution A Linux distribution, often abbreviated as distro, is an operating system that includes the Linux kernel for its kernel functionality. Although the name does not imply product distribution per se, a distro—if distributed on its own—is oft ...
s, including
Fedora A fedora () is a hat with a soft brim and indented crown.Kilgour, Ruth Edwards (1958). ''A Pageant of Hats Ancient and Modern''. R. M. McBride Company. It is typically creased lengthwise down the crown and "pinched" near the front on both sides ...
,
Slackware Slackware is a Linux distribution created by Patrick Volkerding in 1993. Originally based on Softlanding Linux System (SLS), Slackware has been the basis for many other Linux distributions, most notably the first versions of SUSE Linux distr ...
,
Ubuntu Ubuntu ( ) is a Linux distribution based on Debian and composed primarily of free and open-source software. Developed by the British company Canonical (company), Canonical and a community of contributors under a Meritocracy, meritocratic gover ...
, and
Debian Debian () is a free and open-source software, free and open source Linux distribution, developed by the Debian Project, which was established by Ian Murdock in August 1993. Debian is one of the oldest operating systems based on the Linux kerne ...
use xz for compressing their software packages.
Arch Linux Arch Linux () is an Open-source software, open source, rolling release Linux distribution. Arch Linux is kept up-to-date by regularly updating the individual pieces of software that it comprises. Arch Linux is intentionally minimal, and is meant ...
previously used xz to compress packages, but as of 27 December 2019, packages are compressed with
Zstandard Zstandard is a lossless compression, lossless data compression algorithm developed by Yann Collet at Facebook. Zstd is the corresponding reference implementation in C (programming language), C, released as open-source software on 31 August 201 ...
compression. Fedora Linux also switched to compressing its RPM packages with Zstandard with Fedora Linux 31. The
GNU GNU ( ) is an extensive collection of free software (394 packages ), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operating systems popu ...
FTP archive also uses xz.


Backdoor incident

On 29 March 2024, Andres Freund, a
PostgreSQL PostgreSQL ( ) also known as Postgres, is a free and open-source software, free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transaction processing, transactions ...
developer working at
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
, announced that he had found a backdoor in XZ Utils, impacting versions 5.6.0 and 5.6.1. Malicious code for setting up the backdoor had been hidden in compressed test files, and the configure script in the tar files was modified to trigger the hidden code. Freund started his investigation because "After observing a few odd symptoms around liblzma (part of the xz package)" as he found that ssh logins using sshd were "taking a lot of
CPU A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, log ...
,
valgrind Valgrind () is a programming tool for memory debugging, memory leak detection, and profiling. Valgrind was originally designed to be a freely licensed memory debugging tool for Linux on x86, but has since evolved to become a generic framework ...
errors". The vulnerability received a
Common Vulnerability Scoring System The Common Vulnerability Scoring System (CVSS) is a technical standard for assessing the severity of vulnerabilities in computing systems. Scores are calculated based on a formula with several metrics that approximate ease and impact of an exploi ...
(CVSS) score of 10 (the highest).


References


External links

* {{DEFAULTSORT:Xz Utils Free data compression software Free software programmed in C Public-domain software with source code Unix archivers and compression-related utilities