LZ4 is a

lossless data compression Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistic ...

algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...

that is focused on compression and decompression speed. It belongs to the

LZ77 LZ77 and LZ78 are the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as LZ1 and LZ2 respectively. These two algorithms form the basis for many variations includin ...

family of byte-oriented compression schemes.

Features

The LZ4 algorithms aims to provide a good trade-off between speed and compression ratio. Typically, it has a smaller (i.e., worse) compression ratio than the similar LZO algorithm, which in turn is worse than algorithms like DEFLATE. However, LZ4 compression speed is similar to LZO and several times faster than DEFLATE, while decompression speed is significantly faster than LZO.

Design

LZ4 only uses a dictionary-matching stage (LZ77), and unlike other common compression algorithms does not combine it with an entropy coding stage (e.g.

Huffman coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algo ...

in DEFLATE). The LZ4 algorithm represents the data as a series of sequences. Each sequence begins with a one-byte token that is broken into two 4-bit fields. The first field represents the number of literal bytes that are to be copied to the output. The second field represents the number of bytes to copy from the already decoded output buffer (with 0 representing the minimum match length of 4 bytes). A value of 15 in either of the bitfields indicates that the length is larger and there is an extra byte of data that is to be added to the length. A value of 255 in these extra bytes indicates that yet another byte is to be added. Hence arbitrary lengths are represented by a series of extra bytes containing the value 255. The string of literals comes after the token and any extra bytes needed to indicate string length. This is followed by an offset that indicates how far back in the output buffer to begin copying. The extra bytes (if any) of the match-length come at the end of the sequence. Compression can be carried out in a stream or in blocks. Higher compression ratios can be achieved by investing more effort in finding the best matches. This results in both a smaller output and faster decompression.

Implementation

The reference implementation in C by Yann Collet is licensed under a BSD license. There are ports and bindings in various languages including

Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...

, C#, and Python. The

Apache Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage a ...

system uses this algorithm for fast compression. LZ4 was also implemented natively in the Linux kernel 3.11. The FreeBSD, Illumos,

ZFS ZFS (previously: Zettabyte File System) is a file system with volume management capabilities. It began as part of the Sun Microsystems Solaris operating system in 2001. Large parts of Solaris – including ZFS – were published under an ope ...

on Linux, and ZFS-OSX implementations of the ZFS filesystem support the LZ4 algorithm for on-the-fly compression. Linux supports LZ4 for

SquashFS Squashfs is a compressed read-only file system for Linux. Squashfs compresses files, inodes and directories, and supports block sizes from 4 KiB up to 1 MiB for greater compression. Several compression algorithms are supported. Squashfs is ...

since 3.19-rc1. LZ4 is also implemented in newer Zstd archiver by Yann Collet. * LZ4 available in extended 7zip-Version
7-zip-zstd
* Lizard (previous LZ5) is an improved LZ4 implementation by very fast decompression of more than 1000 MB per s and is also available in 7-zip-zstd.

References

External links

* {{Compression software Lossless compression algorithms Free data compression software C (programming language) libraries