Background
Hash collisions can be unavoidable depending on the number of objects in a set and whether or not the bit string they are mapped to is long enough in length. When there is a set of ''n'' objects, if ''n'' is greater than , ''R'', , which in this case ''R'' is the range of the hash value, the probability that there will be a hash collision is 1, meaning it is guaranteed to occur. Another reason hash collisions are likely at some point in time stems from the idea of the birthday paradox in mathematics. This problem looks at the probability of a set of two randomly chosen people having the same birthday out of ''n'' number of people. This idea has led to what has been called theProbability of occurrence
Hash collisions can occur by chance and can be intentionally created for many hash algorithms. The probability of a hash collision thus depends on the size of the algorithm, the distribution of hash values, and whether or not it is both mathematically known and computationally feasible to create specific collisions. Take into account the following hash algorithms – CRC-32, MD5, and SHA-1. These are common hash algorithms with varying levels of collision risk.CRC-32
CRC-32 poses the highest risk for hash collisions. This hash function is generally not recommended for use. If a hub were to contain 77,163 hash values, the chance of a hash collision occurring is 50%, which is extremely high compared to other methods.MD5
MD5 is the most commonly used and when compared to the other two hash functions, it represents the middle ground in terms of hash collision risk. In order to get a 50% chance of a hash collision occurring, there would have to be over 5.06 billion records in the hubSHA-1
SHA-1 offers the lowest risk for hash collisions. For a SHA-1 function to have a 50% chance of a hash collision occurring, there would have to be 1.42 × 10 records in the hub. Note, the number of records mentioned in these examples would have to be in the ''same'' hub. Having a hub with a smaller number of records could decrease the probability of a hash collision in all of these hash functions, although there will always be a minor risk present, which is inevitable, unless collision resolution techniques are used.Collision resolution
Since hash collisions are inevitable, hash tables have mechanisms of dealing with them, known as collision resolutions. Two of the most common strategies are open addressing and separate chaining. The cache-conscious collision resolution is another strategy that has been discussed in the past for string hash tables.Open addressing
Cells in the hash table are assigned one of three states in this method – occupied, empty, or deleted. If a hash collision occurs, the table will be probed to move the record to an alternate cell that is stated as empty. There are different types of probing that take place when a hash collision happens and this method is implemented. Some types of probing are linear probing, double hashing, andSeparate chaining
This strategy allows more than one record to be "chained" to the cells of a hash table. If two records are being directed to the same cell, both would go into that cell as a linked list. This efficiently prevents a hash collision from occurring since records with the same hash values can go into the same cell, but it has its disadvantages. Keeping track of so many lists is difficult and can cause whatever tool that is being used to become very slow. Separate chaining is also known as open hashing.Cache-conscious collision resolution
Although much less used than the previous two, has proposed the cache-conscious collision resolution method in 2005. It is a similar idea to the separate chaining methods, although it does not technically involve the chained lists. In this case, instead of chained lists, the hash values are represented in a contiguous list of items. This is better suited for string hash tables and the use for numeric values is still unknown.See also
* List of hash functions * Universal one-way hash function * Cryptography * Universal hashing * Perfect hash functionReferences
Hashing {{DEFAULTSORT:Hash_Collision