Pearson Hashing
   HOME

TheInfoList



OR:

Pearson hashing is a
non-cryptographic hash function The non-cryptographic hash functions (NCHFs) are hash functions intended for applications that do not need the rigorous security requirements of the cryptographic hash functions (e.g., preimage resistance) and therefore can be faster and less resou ...
designed for fast execution on processors with 8-bit
register Register or registration may refer to: Arts, entertainment, and media Music * Register (music), the relative "height" or range of a note, melody, part, instrument, etc. * ''Register'', a 2017 album by Travis Miller * Registration (organ), ...
s. Given an input consisting of any number of bytes, it produces as output a single byte that is strongly dependent on every byte of the input. Its implementation requires only a few instructions, plus a 256-byte
lookup table In computer science, a lookup table (LUT) is an array data structure, array that replaces runtime (program lifecycle phase), runtime computation of a mathematical function (mathematics), function with a simpler array indexing operation, in a proc ...
containing a
permutation In mathematics, a permutation of a set can mean one of two different things: * an arrangement of its members in a sequence or linear order, or * the act or process of changing the linear order of an ordered set. An example of the first mean ...
of the values 0 through 255. This hash function is a
CBC-MAC In cryptography, a cipher block chaining message authentication code (CBC-MAC) is a technique for constructing a message authentication code (MAC) from a block cipher. The message is encrypted with some block cipher algorithm in cipher block ch ...
that uses an 8-bit
substitution cipher In cryptography, a substitution cipher is a method of encrypting in which units of plaintext are replaced with the ciphertext, in a defined manner, with the help of a key; the "units" may be single letters (the most common), pairs of letters, t ...
implemented via the substitution table. An 8-bit
cipher In cryptography, a cipher (or cypher) is an algorithm for performing encryption or decryption—a series of well-defined steps that can be followed as a procedure. An alternative, less common term is ''encipherment''. To encipher or encode i ...
has negligible cryptographic security, so the Pearson hash function is not cryptographically strong, but it is useful for implementing
hash table In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
s or as a data integrity check code, for which purposes it offers these benefits: * It is extremely simple. * It executes quickly on resource-limited processors. * There is no simple class of inputs for which
collision In physics, a collision is any event in which two or more bodies exert forces on each other in a relatively short time. Although the most common use of the word ''collision'' refers to incidents in which two or more objects collide with great for ...
s (identical outputs) are especially likely. * Given a small, privileged set of inputs (e.g.,
reserved word In a programming language, a reserved word (sometimes known as a reserved identifier) is a word that cannot be used by a programmer as an identifier, such as the name of a variable, function, or label – it is "reserved from use". In brief, an '' ...
s for a
compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
), the permutation table can be adjusted so that those inputs yield distinct hash values, producing what is called a
perfect hash function In computer science, a perfect hash function for a set is a hash function that maps distinct elements in to a set of integers, with no hash collision, collisions. In mathematical terms, it is an injective function. Perfect hash functions may ...
. * Two input strings differing by exactly one character never collide. E.g., applying the algorithm on the strings ABC and AEC will never produce the same value. One of its drawbacks when compared with other hashing algorithms designed for
8-bit processor In computer architecture, 8-bit integers or other data units are those that are 8 bits wide (1 octet). Also, 8-bit central processing unit (CPU) and arithmetic logic unit (ALU) architectures are those that are based on registers or data buses ...
s is the suggested 256 byte lookup table, which can be prohibitively large for a small
microcontroller A microcontroller (MC, uC, or μC) or microcontroller unit (MCU) is a small computer on a single integrated circuit. A microcontroller contains one or more CPUs (processor cores) along with memory and programmable input/output peripherals. Pro ...
with a program memory size on the order of hundreds of bytes. A workaround to this is to use a simple permutation function instead of a table stored in program memory. However, using a too simple function, such as T = 255-i, partly defeats the usability as a hash function as
anagram An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once. For example, the word ''anagram'' itself can be rearranged into the phrase "nag a ram"; which ...
s will result in the same hash value; using a too complex function, on the other hand, will affect speed negatively. Using a function rather than a table also allows extending the block size. Such functions naturally have to be
bijective In mathematics, a bijection, bijective function, or one-to-one correspondence is a function between two sets such that each element of the second set (the codomain) is the image of exactly one element of the first set (the domain). Equival ...
, like their table variants. The algorithm can be described by the following
pseudocode In computer science, pseudocode is a description of the steps in an algorithm using a mix of conventions of programming languages (like assignment operator, conditional operator, loop) with informal, usually self-explanatory, notation of actio ...
, which computes the hash of message ''C'' using the permutation table ''T'': algorithm pearson hashing is h := 0 for each c in C loop h := T h xor c end loop return h The hash variable () may be initialized differently, e.g. to the length of the data () modulo 256.


Example implementations


C#, 8-bit

public class PearsonHashing { public static byte Hash(string input) { byte[] T = { /* Permutation of 0-255 */ }; byte hash = 0; byte[] bytes = Encoding.UTF8.GetBytes(input); foreach (byte b in bytes) { hash = T[hash ^ b]; } return hash; } }


See also

* Non-cryptographic hash functions


References

Error detection and correction Hash function (non-cryptographic) Articles with example pseudocode