Lempel–Ziv–Storer–Szymanski
   HOME

TheInfoList



OR:

Lempel–Ziv–Storer–Szymanski (LZSS) is a
lossless data compression Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistic ...
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
, a derivative of
LZ77 LZ77 and LZ78 are the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as LZ1 and LZ2 respectively. These two algorithms form the basis for many variations includin ...
, that was created in 1982 by James A. Storer and Thomas Szymanski. LZSS was described in article "Data compression via textual substitution" published in ''Journal of the ACM'' (1982, pp. 928–951). LZSS is a dictionary coding technique. It attempts to replace a string of symbols with a reference to a dictionary location of the same string. The main difference between LZ77 and LZSS is that in LZ77 the dictionary reference could actually be longer than the string it was replacing. In LZSS, such references are omitted if the length is less than the "break even" point. Furthermore, LZSS uses one-bit flags to indicate whether the next chunk of data is a literal (byte) or a reference to an offset/length pair.


Example

Here is the beginning of Dr. Seuss's ''
Green Eggs and Ham ''Green Eggs and Ham'' is a children's book by Dr. Seuss, first published on August 12, 1960. As of 2019, the book has sold 8 million copies worldwide. The story has appeared in several adaptations, starting with 1973's '' Dr. Seuss on the Loos ...
'', with character numbers at the beginning of lines for convenience. Green Eggs and Ham is a good example to illustrate LZSS compression because the book itself only contains 50 unique words, despite having a word count of 170. Thus, words are repeated, however not in succession.
  0: I am Sam
  9:
 10: Sam I am
 19:
 20: That Sam-I-am!
 35: That Sam-I-am!
 50: I do not like
 64: that Sam-I-am!
 79: 
 80: Do you like green eggs and ham?
112:
113: I do not like them, Sam-I-am.
143: I do not like green eggs and ham.
This text takes 177 bytes in uncompressed form. Assuming a break even point of 2 bytes (and thus 2 byte pointer/offset pairs), and one byte newlines, this text compressed with LZSS becomes 94 bytes long:
 0: I am Sam
 9:
10: (5,3) (0,4)
16:
17: That(4,4)-I-am!(19,16)I do not like
45: t(21,14)
49: Do you(58,5) green eggs and ham?
78: (49,14) them,(24,9).(112,15)(92,18).
Note: this does not include the 12 bytes of flags indicating whether the next chunk of text is a pointer or a literal. Adding it, the text becomes 106 bytes long, which is still shorter than the original 177 bytes.


Implementations

Many popular archivers like PKZip, ARJ, RAR,
ZOO A zoo (short for zoological garden; also called an animal park or menagerie) is a facility in which animals are kept within enclosures for public exhibition and often bred for conservation purposes. The term ''zoological garden'' refers to zoo ...
,
LHarc LHA or LZH is a freeware compression utility and associated file format. It was created in 1988 by , a doctor and originally named LHarc. A complete rewrite of LHarc, tentatively named ''LHx'', was eventually released as ''LH''. It was then rena ...
use LZSS rather than LZ77 as the primary compression algorithm; the encoding of literal characters and of length-distance pairs varies, with the most common option being
Huffman coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algo ...
. Most implementations stem from a
public domain The public domain (PD) consists of all the creative work to which no exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly waived, or may be inapplicable. Because those rights have expired, ...
1989 code by Haruhiko Okumura. Version 4 of the Allegro library can encode and decode an LZSS format, but the feature was cut from version 5. The
Game Boy Advance The (GBA) is a 32-bit handheld game console developed, manufactured and marketed by Nintendo as the successor to the Game Boy Color. It was released in Japan on March 21, 2001, in North America on June 11, 2001, in the PAL region on June 22, ...
BIOS can decode a slightly modified LZSS format. Apple's
Mac OS X macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and la ...
uses LZSS as one of the compression methods for kernel code.


See also

*
LZ77 LZ77 and LZ78 are the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as LZ1 and LZ2 respectively. These two algorithms form the basis for many variations includin ...
*
Lempel–Ziv–Welch Lempel–Ziv–Welch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempe ...
(LZW)


References

{{DEFAULTSORT:Lempel-Ziv-Storer-Szymanski Lossless compression algorithms