Adler-32 is a

checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data ...

algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...

written by

Mark Adler Mark Adler (born 1959) is an American software engineer. He is best known for his work in the field of data compression as the author of the Adler-32 checksum function, and a co-author together with Jean-loup Gailly of the zlib compression li ...

in 1995, modifying

Fletcher's checksum The Fletcher checksum is an algorithm for computing a position-dependent checksum devised by John G. Fletcher (1934–2012) at Lawrence Livermore Labs in the late 1970s. The objective of the Fletcher checksum was to provide error-detection proper ...

. Compared to a

cyclic redundancy check A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to digital data. Blocks of data entering these systems get a short ''check value'' attached, based on ...

of the same length, it trades reliability for speed (preferring the latter). Adler-32 is more reliable than

Fletcher-16 The Fletcher checksum is an algorithm for computing a position-dependent checksum devised by John G. Fletcher (1934–2012) at Lawrence Livermore Labs in the late 1970s. The objective of the Fletcher checksum was to provide error-detection proper ...

, and slightly less reliable than

Fletcher-32 The Fletcher checksum is an algorithm for computing a position-dependent checksum devised by John G. Fletcher (1934–2012) at Lawrence Livermore Labs in the late 1970s. The objective of the Fletcher checksum was to provide error-detection proper ...

History

The Adler-32 checksum is part of the widely used zlib compression library, as both were developed by

. A "

rolling checksum A rolling hash (also known as recursive hashing or rolling checksum) is a hash function where the input is hashed in a window that moves through the input. A few hash functions allow a rolling hash to be computed very quickly—the new hash value ...

" version of Adler-32 is used in the rsync utility.

Calculation

An Adler-32 checksum is obtained by calculating two

16-bit 16-bit microcomputers are microcomputers that use 16-bit microprocessors. A 16-bit register can store 216 different values. The range of integer values that can be stored in 16 bits depends on the integer representation used. With the two ...

checksums ''A'' and ''B'' and concatenating their bits into a 32-bit integer. ''A'' is the sum of all

byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable uni ...

s in the stream plus one, and ''B'' is the sum of the individual values of ''A'' from each step. At the beginning of an Adler-32 run, ''A'' is initialized to 1, ''B'' to 0. The sums are done

modulo In computing, the modulo operation returns the remainder or signed remainder of a division, after one number is divided by another (called the '' modulus'' of the operation). Given two positive numbers and , modulo (often abbreviated as ) is ...

65521 (the largest

prime number A prime number (or a prime) is a natural number greater than 1 that is not a Product (mathematics), product of two smaller natural numbers. A natural number greater than 1 that is not prime is called a composite number. For example, 5 is prime ...

smaller than 2¹⁶). The bytes are stored in network order (

big endian In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most s ...

), ''B'' occupying the two most significant bytes. The function may be expressed as ''A'' = 1 + ''D''₁ + ''D''₂ + ... + ''D''_''n'' (mod 65521) ''B'' = (1 + ''D''₁) + (1 + ''D''₁ + ''D''₂) + ... + (1 + ''D''₁ + ''D''₂ + ... + ''D''_''n'') (mod 65521) = ''n''×''D''₁ + (''n''−1)×''D''₂ + (''n''−2)×''D''₃ + ... + ''D''_''n'' + ''n'' (mod 65521) ''Adler-32''(''D'') = ''B'' × 65536 + ''A'' where ''D'' is the string of bytes for which the checksum is to be calculated, and ''n'' is the length of ''D''.

Example

The Adler-32 sum of the

ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...

string "Wikipedia" would be calculated as follows: A = 920 = 0x398 (base 16) B = 4582 = 0x11E6 Output = 0x11E6 << 16 + 0x398 = 0x11E60398 = 300286872 The modulo operation had no effect in this example, since none of the values reached 65521.

Comparison with the Fletcher checksum

The first difference between the two algorithms is that Adler-32 sums are calculated modulo a prime number, whereas Fletcher sums are calculated modulo 2⁴−1, 2⁸−1, or 2¹⁶−1 (depending on the number of bits used), which are all

composite number A composite number is a positive integer that can be formed by multiplying two smaller positive integers. Equivalently, it is a positive integer that has at least one divisor In mathematics, a divisor of an integer n, also called a factor ...

s. Using a prime number makes it possible for Adler-32 to catch differences in certain combinations of bytes that Fletcher is unable to detect. The second difference, which has the largest effect on the speed of the algorithm, is that the Adler sums are computed over 8-bit

s rather than 16-bit

words A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no conse ...

, resulting in twice the number of loop iterations. This results in the Adler-32 checksum taking between one-and-a-half to two times as long as Fletcher's checksum for 16-bit word aligned data. For byte-aligned data, Adler-32 is faster than a properly implemented Fletcher's checksum (e.g., one found in the Hierarchical Data Format).

Example implementation

In C, an inefficient but straightforward implementation is : const uint32_t MOD_ADLER = 65521; uint32_t adler32(unsigned char *data, size_t len) /* where data is the location of the data in physical memory and len is the length of the data in bytes */ See the zlib source code for a more efficient implementation that requires a fetch and two additions per byte, with the modulo operations deferred with two remainders computed every several thousand bytes, a technique first discovered for Fletcher checksums in 1988. provides a similar optimization, with the addition of a trick that delays computing the "15" in 65536 - 65521 so that modulos become faster: it can be shown that is equivalent to the naive accumulation.

Advantages and disadvantages

* Like the standard

CRC-32 A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to digital data. Blocks of data entering these systems get a short ''check value'' attached, based on ...

, the Adler-32 checksum can be forged easily and is therefore unsafe for protecting against ''intentional'' modification. * It's faster than CRC-32 on many platforms. * Adler-32 has a weakness for short messages with a few hundred bytes, because the checksums for these messages have a poor coverage of the 32 available bits.

Weakness

Adler-32 is weak for short messages because the sum ''A'' does not wrap. The maximum sum of a 128-byte message is 32640, which is below the value 65521 used by the modulo operation, meaning that roughly half of the output space is unused, and the distribution within the used part is nonuniform. An extended explanation can be found in , which mandates the use of CRC32C instead of Adler-32 for SCTP, the Stream Control Transmission Protocol. Adler-32 has also been shown to be weak for small incremental changes, and also weak for strings generated from a common prefix and consecutive numbers (like auto-generated label names by typical code generators).

Notes

External links

* – specification, contains example C code
ZLib
– implements the Adler-32 checksum i
adler32.c

Chrome
– uses an

SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it shoul ...

implementation of Adler-3
adler32_simd.c
* {{IETF RFC, 3309 – information about the short message weakness and related change to SCTP Checksum algorithms