UUHash is a
hash algorithm
A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually ...
employed by clients on the
FastTrack
FastTrack is a peer-to-peer (P2P) protocol that was used by the Kazaa, Grokster, iMesh and Morpheus file sharing programs. FastTrack was the most popular file sharing network in 2003, and used mainly for the exchange of music mp3 files. The netwo ...
network. It is employed for its ability to hash very large files in a very short period of time, even on older computers. However, this is achieved by only hashing a fraction of the file. This weakness makes it trivial to create a hash collision, allowing large sections to be completely altered without altering the
checksum
A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify dat ...
.
This method is used by
Kazaa. The weakness of UUHash is exploited by anti-
p2p agencies to
corrupt
Corruption is a form of dishonesty or a criminal offense which is undertaken by a person or an organization which is entrusted in a position of authority, in order to acquire illicit benefits or abuse power for one's personal gain. Corruption ...
downloads.
How it works
The UUHash is a 160-bit string that is usually
Base64
In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits.
Common to all bina ...
-encoded for presentation. It is a concatenation of an MD5 hash and a
CRC32 sum of selected chunks of the file.
sig2dat source code
file ''sig2dat.c'', function ''GetHashWin32'', retrieved 2014-08-20
The first 307,200 bytes (300 Kibibyte
The byte is a units of information, unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character (computing), character of text in a computer and for this ...
, one "chunk size") of the file are MD5-hashed (less if file is shorter). The 32 bit little endian
In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most si ...
integer value ''smallhash'' is initialized to 0.
If the file is strictly larger than one chunk size, a series of chunks at file offsets of 2n MiB (n ≥ 0) and one chunk right at the end of the file are hashed using a CRC32 (polynomial 0xEDB88320 reversed, 0x04C11DB7 normal). The last chunk of the power-of-two series ends strictly ''more'' than one chunk size before the end of the file, i.e. there is always at least one unread byte between the last two chunks (if there are that many chunks).[BitCollider/0.4.0]
implemented this unfaithfully The end-of-file chunk may be shorter than one chunk size; it starts at or after one chunk size into the file. The CRC is initialized using ''smallhash'' and stored into ''smallhash''.
So, for example:
:offset 0 MiB, 300 KiB hashed with MD5
:offset 1 MiB, 300 KiB hashed with CRC32
:offset 2 MiB, 300 KiB hashed...
:offset 4 MiB, 300 KiB hashed...
:offset 8 MiB, 300 KiB hashed...
:...
:last 300 KiB of file hashed with CRC32
Finally, the bitwise complement of ''smallhash'' (still zero for files up to 300 KiB) is XORed together with the file size in bytes. The 160-bit UUHash is now the concatenation of the 128-bit MD5 hash and the final 32-bit ''smallhash'' value.
Test Vectors
Given are hashes (base64 and hex) for strings of various lengths containing only 0x00 or 0xFF bytes:
Notice that all strings that have a complete MD5 chunk have the same 128-bit prefix. For files that have the same number of chunks the CRC part differs only because of the included file length (all chunks are identical, or this weren't the case). For files up to 300 KiB, the file length can be extracted from the last four bytes of the hash; ''smallhash'' is ~0.
Sig2Dat
The name UUHash derives from th
utility which creates URI Uri may refer to:
Places
* Canton of Uri, a canton in Switzerland
* Úri, a village and commune in Hungary
* Uri, Iran, a village in East Azerbaijan Province
* Uri, Jammu and Kashmir, a town in India
* Uri (island), an island off Malakula Isla ...
s referencing files on Kazaa. These URIs are of the form:
sig2dat://, File: surprise.mp3, Length:5845871Bytes, UUHash:=1LDYkHDl65OprVz37xN1VSo9b00=
Not considering the fact that this URI Uri may refer to:
Places
* Canton of Uri, a canton in Switzerland
* Úri, a village and commune in Hungary
* Uri, Iran, a village in East Azerbaijan Province
* Uri, Jammu and Kashmir, a town in India
* Uri (island), an island off Malakula Isla ...
format is not RFC compliant, UUHash refers to the Base64
In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits.
Common to all bina ...
-encoding of the hash and not the hash itself.
Notes
External links
{{DEFAULTSORT:Uuhash
Search algorithms