A checksum is a small-sized block of data derived from another block of

Additive Checksums (C)

theory from Barr Group

Checksum Calculator

Open source python based application with GUI used to verify downloads.

digital data
Digital data, in information theory and information systems, is information represented as a string of discrete symbols each of which can take on one of only a finite number of values from some alphabet, such as letters or digits. An exampl ...

for the purpose of detecting errors that may have been introduced during its transmission
Transmission may refer to:
Medicine, science and technology
* Power transmission
** Electric power transmission
** Propulsion transmission, technology allowing controlled application of power
*** Automatic transmission
*** Manual transmission
** ...

or storage. By themselves, checksums are often used to verify data integrity
Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire life-cycle and is a critical aspect to the design, implementation, and usage of any system that stores, processes, or retrieves data. The ter ...

but are not relied upon to verify data authenticity.
The procedure which generates this checksum is called a checksum function or checksum algorithm. Depending on its design goals, a good checksum algorithm usually outputs a significantly different value, even for small changes made to the input. This is especially true of cryptographic hash function
A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for cryptography:
* the probability of a particular n-bit output ...

s, which may be used to detect many data corruption errors and verify overall data integrity
Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire life-cycle and is a critical aspect to the design, implementation, and usage of any system that stores, processes, or retrieves data. The ter ...

; if the computed checksum for the current data input matches the stored value of a previously computed checksum, there is a very high probability the data has not been accidentally altered or corrupted.
Checksum functions are related to hash functions, fingerprint
A fingerprint is an impression left by the friction ridges of a human finger. The recovery of partial fingerprints from a crime scene is an important method of forensic science. Moisture and grease on a finger result in fingerprints on surfac ...

s, randomization functions, and cryptographic hash function
A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for cryptography:
* the probability of a particular n-bit output ...

s. However, each of those concepts has different applications and therefore different design goals. For instance, a function returning the start of a string can provide a hash appropriate for some applications but will never be a suitable checksum. Checksums are used as cryptographic primitive
Cryptographic primitives are well-established, low-level cryptographic algorithms that are frequently used to build cryptographic protocols for computer security systems. These routines include, but are not limited to, one-way hash functions and ...

s in larger authentication algorithms. For cryptographic systems with these two specific design goals, see HMAC
In cryptography, an HMAC (sometimes expanded as either keyed-hash message authentication code or hash-based message authentication code) is a specific type of message authentication code (MAC) involving a cryptographic hash function and a secret ...

.
Check digit
A check digit is a form of redundancy check used for error detection on identification numbers, such as bank account numbers, which are used in an application where they will at least sometimes be input manually. It is analogous to a binary parit ...

s and parity bit
A parity bit, or check bit, is a bit added to a string of binary code. Parity bits are a simple form of error detecting code. Parity bits are generally applied to the smallest units of a communication protocol, typically 8-bit octets (bytes), ...

s are special cases of checksums, appropriate for small blocks of data (such as Social Security number
In the United States, a Social Security number (SSN) is a nine-digit number issued to U.S. citizens, permanent residents, and temporary (working) residents under section 205(c)(2) of the Social Security Act, codified as . The number is issued to ...

s, bank account
A bank account is a financial account maintained by a bank or other financial institution in which the financial transactions between the bank and a customer are recorded. Each financial institution sets the terms and conditions for each type of ...

numbers, computer word
In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized datum handled as a unit by the instruction set or the hardware of the processor. The number of bits or digits in a word (the ''word s ...

s, single byte
The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable uni ...

s, etc.). Some error-correcting code
In computing, telecommunication, information theory, and coding theory, an error correction code, sometimes error correcting code, (ECC) is used for controlling errors in data over unreliable or noisy communication channels. The central idea i ...

s are based on special checksums which not only detect common errors but also allow the original data to be recovered in certain cases.
Algorithms

Parity byte or parity word

The simplest checksum algorithm is the so-called longitudinal parity check, which breaks the data into "words" with a fixed number of bits, and then computes theexclusive or
Exclusive or or exclusive disjunction is a logical operation that is true if and only if its arguments differ (one is true, the other is false).
It is symbolized by the prefix operator J and by the infix operators XOR ( or ), EOR, EXOR, , ...

(XOR) of all those words. The result is appended to the message as an extra word. In simpler terms, this means adding a bit to the end of the word to guarantee that there is an even number of '1's. To check the integrity of a message, the receiver computes the exclusive or of all its words, including the checksum; if the result is not a word consisting of zeros, the receiver knows a transmission error occurred.
With this checksum, any transmission error which flips a single bit of the message, or an odd number of bits, will be detected as an incorrect checksum. However, an error that affects two bits will not be detected if those bits lie at the same position in two distinct words. Also swapping of two or more words will not be detected. If the affected bits are independently chosen at random, the probability of a two-bit error being undetected is .
Sum complement

A variant of the previous algorithm is to add all the "words" as unsigned binary numbers, discarding any overflow bits, and append thetwo's complement
Two's complement is a mathematical operation to reversibly convert a positive binary number into a negative binary number with equivalent (but negative) value, using the binary digit with the greatest place value (the leftmost bit in big- endian ...

of the total as the checksum. To validate a message, the receiver adds all the words in the same manner, including the checksum; if the result is not a word full of zeros, an error must have occurred. This variant, too, detects any single-bit error, but the pro modular sum is used in SAE J1708.
Position-dependent

The simple checksums described above fail to detect some common errors which affect many bits at once, such as changing the order of data words, or inserting or deleting words with all bits set to zero. The checksum algorithms most used in practice, such as Fletcher's checksum, Adler-32, andcyclic redundancy check
A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to digital data. Blocks of data entering these systems get a short ''check value'' attached, based on ...

s (CRCs), address these weaknesses by considering not only the value of each word but also its position in the sequence. This feature generally increases the cost
In production, research, retail, and accounting, a cost is the value of money that has been used up to produce something or deliver a service, and hence is not available for use anymore. In business, the cost may be one of acquisition, in whic ...

of computing the checksum.
Fuzzy checksum

The idea of fuzzy checksum was developed for detection ofemail spam
Email spam, also referred to as junk email, spam mail, or simply spam, is unsolicited messages sent in bulk by email (spamming).
The name comes from a Monty Python sketch in which the name of the canned pork product Spam is ubiquitous, unavoida ...

by building up cooperative databases from multiple ISPs of email suspected to be spam. The content of such spam may often vary in its details, which would render normal checksumming ineffective. By contrast, a "fuzzy checksum" reduces the body text to its characteristic minimum, then generates a checksum in the usual manner. This greatly increases the chances of slightly different spam emails producing the same checksum. The ISP spam detection software, such as SpamAssassin
Apache SpamAssassin is a computer program used for e-mail spam filtering. It uses a variety of spam-detection techniques, including DNS and fuzzy checksum techniques, Bayesian filtering, external programs, blacklists and online databases. It i ...

, of co-operating ISPs, submits checksums of all emails to the centralised service such as DCC. If the count of a submitted fuzzy checksum exceeds a certain threshold, the database notes that this probably indicates spam. ISP service users similarly generate a fuzzy checksum on each of their emails and request the service for a spam likelihood.
General considerations

A message that is bits long can be viewed as a corner of the -dimensional hypercube. The effect of a checksum algorithm that yields an -bit checksum is to map each -bit message to a corner of a larger hypercube, with dimension . The corners of this hypercube represent all possible received messages. The valid received messages (those that have the correct checksum) comprise a smaller set, with only corners. A single-bit transmission error then corresponds to a displacement from a valid corner (the correct message and checksum) to one of the adjacent corners. An error which affects bits moves the message to a corner which is steps removed from its correct corner. The goal of a good checksum algorithm is to spread the valid corners as far from each other as possible, to increase the likelihood "typical" transmission errors will end up in an invalid corner.See also

General topic *Algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...

* Check digit
A check digit is a form of redundancy check used for error detection on identification numbers, such as bank account numbers, which are used in an application where they will at least sometimes be input manually. It is analogous to a binary parit ...

* Damm algorithm In error detection, the Damm algorithm is a check digit algorithm that detects all single-digit errors and all adjacent transposition errors. It was presented by H. Michael Damm in 2004.
Strengths and weaknesses Strengths
The Damm algorithm is ...

* Data rot Bit rot may refer to:
* "Bit Rot", a short story by Charles Stross
* Data rot, the decay of electromagnetic charge in a computer's storage
** Disc rot, the deterioration of optical media such as DVDs and CDs
* Software rot
Software rot (bit r ...

* File verification
* Fletcher's checksum
* Frame check sequence
A frame check sequence (FCS) is an error-detecting code added to a frame in a communication protocol. Frames are used to send payload data from a source to a destination.
Purpose
All frames and the bits, bytes, and fields contained within ...

* cksum
* md5sum
* sha1sum
* Parchive
Parchive (a portmanteau of parity archive, and formally known as Parity Volume Set Specification) is an erasure code system that produces par files for checksum verification of data integrity, with the capability to perform data recovery operatio ...

* Sum (Unix)
is a legacy utility available on some Unix and Unix-like operating systems. This utility outputs a 16-bit checksum of each argument file, as well as the number of blocks they take on disk. — manual pages from GNU coreutils Two different chec ...

* SYSV checksum
The SYSV checksum algorithm was a commonly used, legacy checksum algorithm.
It has been implemented in UNIX System V and is also available through the sum command line utility.
This algorithm is useless on a security perspective, and is weaker t ...

* BSD checksum The BSD checksum algorithm was a commonly used, legacy checksum algorithm. It has been implemented in old BSD and is also available through the sum command line utility.
This algorithm is useless from a security perspective, and is weaker than the ...

* xxHash
This is a list of hash functions, including cyclic redundancy checks, checksum functions, and cryptographic hash functions.
Cyclic redundancy checks
Adler-32 is often mistaken for a CRC, but it is not: it is a checksum.
Checksums
Universa ...

Error correction
* Hamming code
In computer science and telecommunication, Hamming codes are a family of linear error-correcting codes. Hamming codes can detect one-bit and two-bit errors, or correct one-bit errors without detection of uncorrected errors. By contrast, the sim ...

* Reed–Solomon error correction
Reed–Solomon codes are a group of error-correcting codes that were introduced by Irving S. Reed and Gustave Solomon in 1960.
They have many applications, the most prominent of which include consumer technologies such as MiniDiscs, CDs, DVDs, ...

* IPv4 header checksum
Hash functions
* List of hash functions
* Luhn algorithm
The Luhn algorithm or Luhn formula, also known as the " modulus 10" or "mod 10" algorithm, named after its creator, IBM scientist Hans Peter Luhn, is a simple checksum formula used to validate a variety of identification numbers, such as credit ...

* Parity bit
A parity bit, or check bit, is a bit added to a string of binary code. Parity bits are a simple form of error detecting code. Parity bits are generally applied to the smallest units of a communication protocol, typically 8-bit octets (bytes), ...

* Rolling checksum
A rolling hash (also known as recursive hashing or rolling checksum) is a hash function where the input is hashed in a window that moves through the input.
A few hash functions allow a rolling hash to be computed very quickly—the new hash value ...

* Verhoeff algorithm
File systems
* ZFS – a file system that performs automatic file integrity checking using checksums
Related concepts
* Isopsephy
*Gematria
Gematria (; he, גמטריא or gimatria , plural or , ''gimatriot'') is the practice of assigning a numerical value to a name, word or phrase according to an alphanumerical cipher. A single word can yield several values depending on the cipher ...

* File fixity
References

External links

{{wikibooks , 1= Algorithm Implementation , 2= ChecksumsAdditive Checksums (C)

theory from Barr Group

Checksum Calculator

Open source python based application with GUI used to verify downloads.