MurmurHash is a non-
cryptographic
Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adver ...
hash function
A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually ...
suitable for general hash-based lookup.
[
]
It was created by Austin Appleby in 2008 and is currently hosted on GitHub along with its test suite named 'SMHasher'. It also exists in a number of variants,
all of which have been released into the public domain. The name comes from two basic operations, multiply (MU) and rotate (R), used in its inner loop.
Unlike
cryptographic hash function
A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for cryptography:
* the probability of a particular n-bit output ...
s, it is not specifically designed to be difficult to reverse by an adversary, making it unsuitable for cryptographic purposes.
Variants
MurmurHash3
The current version is MurmurHash3,
which yields a 32-bit or 128-bit hash value. When using 128-bits, the x86 and x64 versions do not produce the same values, as the algorithms are optimized for their respective platforms. MurmurHash3 was released alongside SMHasher—a hash function test suite.
MurmurHash2
MurmurHash2 yields a 32-bit or 64-bit value. It came in multiple variants, including some that allowed incremental hashing and aligned or neutral versions.
* MurmurHash2 (32-bit, x86) - The original version; contains a flaw that weakens collision in some cases.
* MurmurHash2A (32-bit, x86) - A fixed variant using
Merkle–Damgård construction
In cryptography, the Merkle–Damgård construction or Merkle–Damgård hash function is a method of building collision-resistant cryptographic hash functions from collision-resistant one-way compression functions. Goldwasser, S. and Bellare, M ...
. Slightly slower.
* CMurmurHash2A (32-bit, x86) - MurmurHash2A but works incrementally.
* MurmurHashNeutral2 (32-bit, x86) - Slower, but endian and alignment neutral.
* MurmurHashAligned2 (32-bit, x86) - Slower, but does aligned reads (safer on some platforms).
* MurmurHash64A (64-bit, x64) - The original 64-bit version. Optimized for 64-bit arithmetic.
* MurmurHash64B (64-bit, x86) - A 64-bit version optimized for 32-bit platforms. It is not a true 64-bit hash due to insufficient mixing of the stripes.
The person who originally found the flaw in MurmurHash2 created an unofficial 160-bit version of MurmurHash2 called MurmurHash2_160.
MurmurHash1
The original MurmurHash was created as an attempt to make a faster function than
Lookup3. Although successful, it hadn't been tested thoroughly and wasn't capable of providing 64-bit hashes as in Lookup3. It had a rather elegant design, that would be later built upon in MurmurHash2, combining a multiplicative hash (similar to
Fowler–Noll–Vo hash function) with a
Xorshift
Xorshift random number generators, also called shift-register generators, are a class of pseudorandom number generators that were invented by George Marsaglia. They are a subset of linear-feedback shift registers (LFSRs) which allow a particularl ...
.
Implementations
The canonical implementation is in
C++, but there are efficient ports for a variety of popular languages, including
Python,
C,
Go,
C#,
DLua Perl
Perl is a family of two High-level programming language, high-level, General-purpose programming language, general-purpose, Interpreter (computing), interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it ...
,
Ruby
A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum (aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapp ...
,
Rust
Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH), ...
,
PHP
PHP is a General-purpose programming language, general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementati ...
,
Common Lisp
Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ''ANSI INCITS 226-1994 (S20018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived fr ...
,
Haskell
Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
,
Elm
Elms are deciduous and semi-deciduous trees comprising the flowering plant genus ''Ulmus'' in the plant family Ulmaceae. They are distributed over most of the Northern Hemisphere, inhabiting the temperate and tropical-montane regions of Nor ...
,
Clojure
Clojure (, like ''closure'') is a dynamic and functional dialect of the Lisp programming language on the Java platform. Like other Lisp dialects, Clojure treats code as data and has a Lisp macro system. The current development process is ...
,
Scala,
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
,
Erlang,
Swift
Swift or SWIFT most commonly refers to:
* SWIFT, an international organization facilitating transactions between banks
** SWIFT code
* Swift (programming language)
* Swift (bird), a family of birds
It may also refer to:
Organizations
* SWIFT ...
,
Object Pascal
Object Pascal is an extension to the programming language Pascal that provides object-oriented programming (OOP) features such as classes and methods.
The language was originally developed by Apple Computer as ''Clascal'' for the Lisa Work ...
,
Kotlin, and
JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
.
It has been adopted into a number of open-source projects, most notably
libstdc++ (ver 4.6),
nginx
Nginx (pronounced "engine x" ) is a web server that can also be used as a reverse proxy, load balancer, mail proxy and HTTP cache. The software was created by Igor Sysoev and publicly released in 2004. Nginx is free and open-source software ...
(ver 1.0.1),
Rubinius, libmemcached (the
C driver for
Memcached
Memcached (pronounced variously ''mem-cash-dee'' or ''mem-cashed'') is a general-purpose distributed memory caching, memory-caching system. It is often used to speed up dynamic database-driven websites by caching data and Object (computer science) ...
),
npm (nodejs package manager),
maatkit
Maatkit was a toolkit for development and administration of open-source databases. Most of Maatkit’s functionality was designed for MySQL
MySQL () is an open-source relational database management system (RDBMS). Its name is a combination ...
,
Hadoop
Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage ...
,
Kyoto Cabinet,
Cassandra
Cassandra or Kassandra (; Ancient Greek: Κασσάνδρα, , also , and sometimes referred to as Alexandra) in Greek mythology was a Trojan priestess dedicated to the god Apollo and fated by him to utter true prophecies but never to be believe ...
,
Solr
Solr (pronounced "solar") is an open-source enterprise-search platform, written in Java. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features a ...
,
vowpal wabbit,
Elasticsearch
Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is dual ...
,
Guava
Guava () is a common tropical fruit cultivated in many tropical and subtropical regions. The common guava '' Psidium guajava'' (lemon guava, apple guava) is a small tree in the myrtle family (Myrtaceae), native to Mexico, Central America, ...
,
Kafka
Franz Kafka (3 July 1883 – 3 June 1924) was a German-speaking Bohemian novelist and short-story writer, widely regarded as one of the major figures of 20th-century literature. His work fuses elements of realism and the fantastic. It typ ...
an
RedHat Virtual Data Optimizer (VDO)
Vulnerabilities
Hash functions can be vulnerable to
collision attack
In cryptography, a collision attack on a cryptographic hash tries to find two inputs producing the same hash value, i.e. a hash collision. This is in contrast to a preimage attack where a specific target hash value is specified.
There are roug ...
s, where a user can choose input data in such a way so as to intentionally cause hash collisions. Jean-Philippe Aumasson and
Daniel J. Bernstein were able to show that even implementations of MurmurHash using a randomized seed are vulnerable to so-called
HashDoS
In cryptography, a collision attack on a cryptographic hash tries to find two inputs producing the same hash value, i.e. a hash collision. This is in contrast to a preimage attack where a specific target hash value is specified.
There are roughl ...
attacks.
With the use of
differential cryptanalysis
Differential cryptanalysis is a general form of cryptanalysis applicable primarily to block ciphers, but also to stream ciphers and cryptographic hash functions. In the broadest sense, it is the study of how differences in information input can af ...
they were able to generate inputs that would lead to a hash collision. The authors of the attack recommend to use their own
SipHash instead.
Algorithm
algorithm Murmur3_32 is
''// Note: In this version, all arithmetic is performed with unsigned 32-bit integers.''
''// In the case of overflow, the result is reduced modulo .''
input: ''key'', ''len'', ''seed''
c1 ← 0xcc9e2d51
c2 ← 0x1b873593
r1 ← 15
r2 ← 13
m ← 5
n ← 0xe6546b64
hash ← ''seed''
for each fourByteChunk of ''key'' do
k ← fourByteChunk
k ← k × c1
k ← k ROL r1
k ← k × c2
hash ← hash XOR k
hash ← hash ROL r2
hash ← (hash × m) + n
with any remainingBytesInKey do
remainingBytes ← SwapToLittleEndian(remainingBytesInKey)
''// Note: Endian swapping is only necessary on big-endian machines.''
''// The purpose is to place the meaningful digits towards the low end of the value,''
''// so that these digits have the greatest potential to affect the low range digits''
''// in the subsequent multiplication. Consider that locating the meaningful digits''
''// in the high range would produce a greater effect upon the high digits of the''
''// multiplication, and notably, that such high digits are likely to be discarded''
''// by the modulo arithmetic under overflow. We don't want that.''
remainingBytes ← remainingBytes × c1
remainingBytes ← remainingBytes ROL r1
remainingBytes ← remainingBytes × c2
hash ← hash XOR remainingBytes
hash ← hash XOR ''len''
hash ← hash XOR (hash >> 16)
hash ← hash × 0x85ebca6b
hash ← hash XOR (hash >> 13)
hash ← hash × 0xc2b2ae35
hash ← hash XOR (hash >> 16)
A sample C implementation follows (for little-endian CPUs):
static inline uint32_t murmur_32_scramble(uint32_t k)
uint32_t murmur3_32(const uint8_t* key, size_t len, uint32_t seed)
See also
*
Non-cryptographic hash functions
References
External links
Official SMHasher site
{{DEFAULTSORT:Murmurhash
Hash function (non-cryptographic)
Articles with example pseudocode
Articles with example C code