HOME

TheInfoList



OR:

The Rabin fingerprinting scheme is a method for implementing
fingerprint A fingerprint is an impression left by the friction ridges of a human finger. The recovery of partial fingerprints from a crime scene is an important method of forensic science. Moisture and grease on a finger result in fingerprints on surfa ...
s using
polynomial In mathematics, a polynomial is an expression consisting of indeterminates (also called variables) and coefficients, that involves only the operations of addition, subtraction, multiplication, and positive-integer powers of variables. An ex ...
s over a
finite field In mathematics, a finite field or Galois field (so-named in honor of Évariste Galois) is a field that contains a finite number of elements. As with any field, a finite field is a set on which the operations of multiplication, addition, subt ...
. It was proposed by
Michael O. Rabin Michael Oser Rabin ( he, מִיכָאֵל עוזר רַבִּין; born September 1, 1931) is an Israeli mathematician and computer scientist and a recipient of the Turing Award. Biography Early life and education Rabin was born in 1931 in ...
.


Scheme

Given an ''n''-bit message ''m''0,...,''m''n-1, we view it as a polynomial of degree ''n''-1 over the
finite field In mathematics, a finite field or Galois field (so-named in honor of Évariste Galois) is a field that contains a finite number of elements. As with any field, a finite field is a set on which the operations of multiplication, addition, subt ...
GF(2) (also denoted \mathbb F_2, or \mathbb Z/2\mathbb Z) is the finite field of two elements (GF is the initialism of ''Galois field'', another name for finite fields). Notations and \mathbb Z_2 may be encountered although they can be confused with ...
. : f(x) = m_0 + m_1 x + \ldots + m_ x^ We then pick a random
irreducible polynomial In mathematics, an irreducible polynomial is, roughly speaking, a polynomial that cannot be factored into the product of two non-constant polynomials. The property of irreducibility depends on the nature of the coefficients that are accepted f ...
of degree ''k'' over GF(2), and we define the fingerprint of the message ''m'' to be the remainder r(x) after division of f(x) by p(x) over GF(2) which can be viewed as a polynomial of degree or as a ''k''-bit number.


Applications

Many implementations of the
Rabin–Karp algorithm In computer science, the Rabin–Karp algorithm or Karp–Rabin algorithm is a string-searching algorithm created by that uses hashing to find an exact match of a pattern string in a text. It uses a rolling hash to quickly filter out position ...
internally use Rabin fingerprints. The ''Low Bandwidth Network Filesystem'' (LBFS) from MIT uses Rabin fingerprints to implement variable size shift-resistant blocks.Athicha Muthitacharoen, Benjie Chen, and
David Mazières David (; , "beloved one") (traditional spelling), , ''Dāwūd''; grc-koi, Δαυΐδ, Dauíd; la, Davidus, David; gez , ዳዊት, ''Dawit''; xcl, Դաւիթ, ''Dawitʿ''; cu, Давíдъ, ''Davidŭ''; possibly meaning "beloved one". w ...

"A Low-bandwidth Network File System"
/ref> The basic idea is that the filesystem computes the
cryptographic hash A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for cryptography: * the probability of a particular n-bit output ...
of each block in a file. To save on transfers between the client and server, they compare their checksums and only transfer blocks whose checksums differ. But one problem with this scheme is that a single insertion at the beginning of the file will cause every checksum to change if fixed-sized (e.g. 4 KB) blocks are used. So the idea is to select blocks not based on a specific offset but rather by some property of the block contents. LBFS does this by sliding a 48 byte window over the file and computing the Rabin fingerprint of each window. When the low 13 bits of the fingerprint are zero LBFS calls those 48 bytes a breakpoint and ends the current block and begins a new one. Since the output of Rabin fingerprints are pseudo-random the probability of any given 48 bytes being a breakpoint is 2^ (1 in 8192). This has the effect of shift-resistant variable size blocks. ''Any''
hash function A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually ...
could be used to divide a long file into blocks (as long as a
cryptographic hash function A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for cryptography: * the probability of a particular n-bit output ...
is then used to find the checksum of each block): but the Rabin fingerprint is an efficient rolling hash, since the computation of the Rabin fingerprint of region ''B'' can reuse some of the computation of the Rabin fingerprint of region ''A'' when regions ''A'' and ''B'' overlap. Note that this is a problem similar to that faced by
rsync rsync is a utility for efficiently transferring and synchronizing files between a computer and a storage drive and across networked computers by comparing the modification times and sizes of files. It is commonly found on Unix-like operatin ...
.


See also

* W-shingling * Rolling hash


References


External links

* * * Ross N. Williams (1993).
A painless guide to CRC Error detection algorithms
. Theory of cryptography Fingerprinting algorithms {{crypto-stub