HOME

TheInfoList



OR:

Nilsimsa is an
anti-spam Various anti-spam techniques are used to prevent email spam (unsolicited bulk email). No technique is a complete solution to the spam problem, and each has trade-offs between incorrectly rejecting legitimate email ( false positives) as opposed ...
focused locality-sensitive hashing algorithm originally proposed the cmeclax remailer operator in 2001 and then reviewed by Ernesto Damiani et al. in their 2004 paper titled, "An Open Digest-based Technique for Spam Detection". The goal of Nilsimsa is to generate a hash digest of an email message such that the digests of two similar messages are similar to each other. In comparison with
cryptographic hash function A cryptographic hash function (CHF) is a hash algorithm (a map (mathematics), map of an arbitrary binary string to a binary string with a fixed size of n bits) that has special properties desirable for a cryptography, cryptographic application: ...
s such as
SHA-1 In cryptography, SHA-1 (Secure Hash Algorithm 1) is a hash function which takes an input and produces a 160-bit (20-byte) hash value known as a message digest – typically rendered as 40 hexadecimal digits. It was designed by the United States ...
or
MD5 The MD5 message-digest algorithm is a widely used hash function producing a 128-bit hash value. MD5 was designed by Ronald Rivest in 1991 to replace an earlier hash function MD4, and was specified in 1992 as Request for Comments, RFC 1321. MD5 ...
, making a small modification to a document does not substantially change the resulting hash of the document. The paper suggests that the Nilsimsa satisfies three requirements: # The digest identifying each message should not vary significantly (sic) for changes that can be produced automatically. # The encoding must be robust against intentional attacks. # The encoding should support an extremely low risk of false positives. Subsequent testing on a range of file types identified the Nilsimsa hash as having a significantly higher false positive rate when compared to other similarity digest schemes such as TLSH, Ssdeep and Sdhash. Nilsimsa similarity matching was taken in consideration by Jesse Kornblum when developing the fuzzy hashing in 2006, that used the algorithms of spamsum by
Andrew Tridgell Andrew "Tridge" Tridgell (born 28 February 1967) is an Australian computer programmer. He is the author of and a contributor to the Samba (software), Samba file server, and co-inventor of the rsync algorithm. He has analysed complex proprieta ...
(2002). Several implementations of Nilsimsa exist as
open-source software Open-source software (OSS) is Software, computer software that is released under a Open-source license, license in which the copyright holder grants users the rights to use, study, change, and Software distribution, distribute the software an ...
.


References

{{Reflist Anti-spam