Rolling Checksum
   HOME
*





Rolling Checksum
A rolling hash (also known as recursive hashing or rolling checksum) is a hash function where the input is hashed in a window that moves through the input. A few hash functions allow a rolling hash to be computed very quickly—the new hash value is rapidly calculated given only the old hash value, the old value removed from the window, and the new value added to the window—similar to the way a moving average function can be computed much more quickly than other low-pass filters. One of the main applications is the Rabin–Karp string search algorithm, which uses the rolling hash described below. Another popular application is the rsync program, which uses a checksum based on Mark Adler's adler-32 as its rolling hash. Low Bandwidth Network Filesystem (LBFS) uses a Rabin fingerprint as its rolling hash. FastCDC (Fast Content-Defined Chunking) uses a compute-efficient Gear fingerprint as its rolling hash. At best, rolling hash values are pairwise independentDaniel Lemire, Owen ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Hash Function
A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually used to index a fixed-size table called a ''hash table''. Use of a hash function to index a hash table is called ''hashing'' or ''scatter storage addressing''. Hash functions and their associated hash tables are used in data storage and retrieval applications to access data in a small and nearly constant time per retrieval. They require an amount of storage space only fractionally greater than the total space required for the data or records themselves. Hashing is a computationally and storage space-efficient form of data access that avoids the non-constant access time of ordered and unordered lists and structured trees, and the often exponential storage requirements of direct access of state spaces of large or variable-length keys. Use o ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Finite Field
In mathematics, a finite field or Galois field (so-named in honor of Évariste Galois) is a field that contains a finite number of elements. As with any field, a finite field is a set on which the operations of multiplication, addition, subtraction and division are defined and satisfy certain basic rules. The most common examples of finite fields are given by the integers mod when is a prime number. The ''order'' of a finite field is its number of elements, which is either a prime number or a prime power. For every prime number and every positive integer there are fields of order p^k, all of which are isomorphic. Finite fields are fundamental in a number of areas of mathematics and computer science, including number theory, algebraic geometry, Galois theory, finite geometry, cryptography and coding theory. Properties A finite field is a finite set which is a field; this means that multiplication, addition, subtraction and division (excluding division by zero ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


MinHash
In computer science and data mining, MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how Similarity measure, similar two sets are. The scheme was invented by , and initially used in the AltaVista search engine to detect duplicate web pages and eliminate them from search results.. It has also been applied in large-scale Cluster analysis, clustering problems, such as document clustering, clustering documents by the similarity of their sets of words.. Jaccard similarity and minimum hash values The Jaccard index, Jaccard similarity coefficient is a commonly used indicator of the similarity between two sets. Let be a set and and be subsets of , then the Jaccard index is defined to be the ratio of the number of elements of their intersection (set theory), intersection and the number of elements of their union (set theory), union: : J(A,B) = . This value is 0 when the two sets are Disjoint sets, disjoint, 1 when ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Free-software
Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, not price; all users are legally free to do what they want with their copies of a free software (including profiting from them) regardless of how much is paid to obtain the program.Selling Free Software
(gnu.org)
Computer programs are deemed "free" if they give end-users (not just the developer) ultimate control over the software and, subsequently, over their devices. The right to study and modify a computer program entails that

Attic (backup Software)
Borg is deduplicating backup software for various Unix-like operating systems. Originally it called Attic. History Attic development began in 2010 and was accepted to Debian in August 2013. Attic is available from pip and notably part of Debian, Ubuntu, Arch and Slackware. In 2015, Attic was forked as "Borg" to support a "more open, faster paced development", according to its developers. Many issues in Attic have been fixed in this fork, but backward compatibility with the original program has been lost (a non-reversible upgrade process exists). Borg 1.0.0 was released on 5 March 2016, Borg 1.1.0 was released on 7 October 2017. As of 2018, Borg is under active development by many contributors, while Attic is not being developed.Stable releases are available from various Linux distributions such as Debian, Ubuntu, Fedora, OpenSUSE and others, from the ports collection of various BSD derivatives and from brew for macOS. The project provides pre-built binaries for Linux, F ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Exclusive Or
Exclusive or or exclusive disjunction is a logical operation that is true if and only if its arguments differ (one is true, the other is false). It is symbolized by the prefix operator J and by the infix operators XOR ( or ), EOR, EXOR, , , , , , and . The negation of XOR is the logical biconditional, which yields true if and only if the two inputs are the same. It gains the name "exclusive or" because the meaning of "or" is ambiguous when both operands are true; the exclusive or operator ''excludes'' that case. This is sometimes thought of as "one or the other but not both". This could be written as "A or B, but not, A and B". Since it is associative, it may be considered to be an ''n''-ary operator which is true if and only if an odd number of arguments are true. That is, ''a'' XOR ''b'' XOR ... may be treated as XOR(''a'',''b'',...). Truth table The truth table of A XOR B shows that it outputs true whenever the inputs differ: Equivalences, elimination, and introdu ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Circular Shift
In combinatorial mathematics, a circular shift is the operation of rearranging the entries in a tuple, either by moving the final entry to the first position, while shifting all other entries to the next position, or by performing the inverse operation. A circular shift is a special kind of cyclic permutation, which in turn is a special kind of permutation. Formally, a circular shift is a permutation σ of the ''n'' entries in the tuple such that either :\sigma(i)\equiv (i+1) modulo ''n'', for all entries ''i'' = 1, ..., ''n'' or :\sigma(i)\equiv (i-1) modulo ''n'', for all entries ''i'' = 1, ..., ''n''. The result of repeatedly applying circular shifts to a given tuple are also called the circular shifts of the tuple. For example, repeatedly applying circular shifts to the four-tuple (''a'', ''b'', ''c'', ''d'') successively gives * (''d'', ''a'', ''b'', ''c''), * (''c'', ''d'', ''a'', ''b''), * (''b'', ''c'', ''d'', ''a''), * (''a'', ''b'', ''c'', ''d'') (the original four-tupl ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Hash Table
In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', also called a ''hash code'', into an array of ''buckets'' or ''slots'', from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored. Ideally, the hash function will assign each key to a unique bucket, but most hash table designs employ an imperfect hash function, which might cause hash '' collisions'' where the hash function generates the same index for more than one key. Such collisions are typically accommodated in some way. In a well-dimensioned hash table, the average time complexity for each lookup is independent of the number of elements stored in the table. Many hash table designs also allow arbitrary insertions and deletions of key–value pa ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Tabulation Hashing
In computer science, tabulation hashing is a method for constructing universal families of hash functions by combining table lookup with exclusive or operations. It was first studied in the form of Zobrist hashing for computer games; later work by Carter and Wegman extended this method to arbitrary fixed-length keys. Generalizations of tabulation hashing have also been developed that can handle variable-length keys such as text strings. Despite its simplicity, tabulation hashing has strong theoretical properties that distinguish it from some other hash functions. In particular, it is 3-independent: every 3-tuple of keys is equally likely to be mapped to any 3-tuple of hash values. However, it is not 4-independent. More sophisticated but slower variants of tabulation hashing extend the method to higher degrees of independence. Because of its high degree of independence, tabulation hashing is usable with hashing methods that require a high-quality hash function, including hopsco ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Barrel Shifter
A barrel shifter is a digital circuit that can shift a data word by a specified number of bits without the use of any sequential logic, only pure combinational logic, i.e. it inherently provides a binary operation. It can however in theory also be used to implement unary operations, such as logical shift left, in cases where limited by a fixed amount (e.g. for address generation unit). One way to implement a barrel shifter is as a sequence of multiplexers where the output of one multiplexer is connected to the input of the next multiplexer in a way that depends on the shift distance. A barrel shifter is often used to shift and rotate n-bits in modern microprocessors, typically within a single clock cycle. For example, take a four-bit barrel shifter, with inputs A, B, C and D. The shifter can cycle the order of the bits ''ABCD'' as ''DABC'', ''CDAB'', or ''BCDA''; in this case, no bits are lost. That is, it can shift all of the outputs up to three positions to the right (and th ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


CRC32
A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to digital data. Blocks of data entering these systems get a short ''check value'' attached, based on the remainder of a polynomial division of their contents. On retrieval, the calculation is repeated and, in the event the check values do not match, corrective action can be taken against data corruption. CRCs can be used for error correction (see bitfilters). CRCs are so called because the ''check'' (data verification) value is a ''redundancy'' (it expands the message without adding information) and the algorithm is based on ''cyclic'' codes. CRCs are popular because they are simple to implement in binary hardware, easy to analyze mathematically, and particularly good at detecting common errors caused by noise in transmission channels. Because the check value has a fixed length, the function that generates it is occasionally us ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


GF(2)
(also denoted \mathbb F_2, or \mathbb Z/2\mathbb Z) is the finite field of two elements (GF is the initialism of ''Galois field'', another name for finite fields). Notations and \mathbb Z_2 may be encountered although they can be confused with the notation of -adic integers. is the field with the smallest possible number of elements, and is unique if the additive identity and the multiplicative identity are denoted respectively and , as usual. The elements of may be identified with the two possible values of a bit and to the boolean values ''true'' and ''false''. It follows that is fundamental and ubiquitous in computer science and its logical foundations. Definition GF(2) is the unique field with two elements with its additive and multiplicative identities respectively denoted and . Its addition is defined as the usual addition of integers but modulo 2 and corresponds to the table below: If the elements of GF(2) are seen as boolean values, then the addition ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]