Nilsimsa Hash
   HOME





Nilsimsa Hash
Nilsimsa is an anti-spam focused locality-sensitive hashing algorithm originally proposed the cmeclax remailer operator in 2001 and then reviewed by Ernesto Damiani et al. in their 2004 paper titled, "An Open Digest-based Technique for Spam Detection". The goal of Nilsimsa is to generate a hash digest of an email message such that the digests of two similar messages are similar to each other. In comparison with cryptographic hash functions such as SHA-1 or MD5, making a small modification to a document does not substantially change the resulting hash of the document. The paper suggests that the Nilsimsa satisfies three requirements: # The digest identifying each message should not vary significantly (sic) for changes that can be produced automatically. # The encoding must be robust against intentional attacks. # The encoding should support an extremely low risk of false positives. Subsequent testing on a range of file types identified the Nilsimsa hash as having a significantly hi ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Anti-spam Techniques
Various anti-spam techniques are used to prevent email spam (unsolicited bulk email). No technique is a complete solution to the spam problem, and each has trade-offs between incorrectly rejecting legitimate email (false positives) as opposed to not rejecting all spam email ( false negatives) – and the associated costs in time, effort, and cost of wrongfully obstructing good mail. Anti-spam techniques can be broken into four broad categories: those that require actions by individuals, those that can be automated by email administrators, those that can be automated by email senders and those employed by researchers and law enforcement officials. End-user techniques There are a number of techniques that individuals can use to restrict the availability of their email addresses, with the goal of reducing their chance of receiving spam. Discretion Sharing an email address only among a limited group of correspondents is one way to limit the chance that the address will be "harve ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Locality-sensitive Hashing
In computer science, locality-sensitive hashing (LSH) is a fuzzy hashing technique that hashes similar input items into the same "buckets" with high probability. (The number of buckets is much smaller than the universe of possible input items.) Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items. Hashing-based approximate nearest-neighbor search algorithms generally use one of two main categories of hashing methods: either data-independent methods, such as locality-sensitive hashing (LSH); or data-dependent methods, such as locality-preserving hashing (LPH). Locality-preserving hashin ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Ernesto Damiani
Ernesto Damiani is a professor of computer science at the University of Milan, where he leads the ''Architectures Research'' (SESAR) Lab. He is the Senior Director of the Artificial Intelligence and Intelligent Systems Institute at Khalifa University, in the UAE. He holds visiting positions at Tokyo Denki University, Université de Bourgogne. Damiani received an honorary doctorate from Institut National des Sciences Appliquées de Lyon, France (2017). His research spans security, Big Data and knowledge processing, where he has published over 400 peer-reviewed articles and books. He is a Senior Member of the IEEE The Institute of Electrical and Electronics Engineers (IEEE) is an American 501(c)(3) organization, 501(c)(3) public charity professional organization for electrical engineering, electronics engineering, and other related disciplines. The IEEE ... and a Distinguished Scientist of ACM. Since 2018, he is the President of the Italian Inter-University Consortium for Inf ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Cryptographic Hash Function
A cryptographic hash function (CHF) is a hash algorithm (a map (mathematics), map of an arbitrary binary string to a binary string with a fixed size of n bits) that has special properties desirable for a cryptography, cryptographic application: * the probability of a particular n-bit output result (hash value) for a random input string ("message") is 2^ (as for any good hash), so the hash value can be used as a representative of the message; * finding an input string that matches a given hash value (a ''pre-image'') is infeasible, ''assuming all input strings are equally likely.'' The ''resistance'' to such search is quantified as security strength: a cryptographic hash with n bits of hash value is expected to have a ''preimage resistance'' strength of n bits, unless the space of possible input values is significantly smaller than 2^ (a practical example can be found in ); * a ''second preimage'' resistance strength, with the same expectations, refers to a similar problem of f ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

SHA-1
In cryptography, SHA-1 (Secure Hash Algorithm 1) is a hash function which takes an input and produces a 160-bit (20-byte) hash value known as a message digest – typically rendered as 40 hexadecimal digits. It was designed by the United States National Security Agency, and is a U.S. Federal Information Processing Standard. The algorithm has been cryptographically broken but is still widely used. Since 2005, SHA-1 has not been considered secure against well-funded opponents; as of 2010 many organizations have recommended its replacement. NIST formally deprecated use of SHA-1 in 2011 and disallowed its use for digital signatures in 2013, and declared that it should be phased out by 2030. , chosen-prefix attacks against SHA-1 are practical. As such, it is recommended to remove SHA-1 from products as soon as possible and instead use SHA-2 or SHA-3. Replacing SHA-1 is urgent where it is used for digital signatures. All major web browser vendors ceased acceptance of SHA-1 SSL certifi ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Andrew Tridgell
Andrew "Tridge" Tridgell (born 28 February 1967) is an Australian computer programmer. He is the author of and a contributor to the Samba (software), Samba file server, and co-inventor of the rsync algorithm. He has analysed complex proprietary protocols and algorithms, to allow compatible free and open source software implementations. Projects Tridgell was a major developer of the Samba software, analyzing the Server Message Block protocol used for Workgroup (computer networking), workgroup and network file sharing by Microsoft Windows products. He developed thhierarchical memory allocator, originally as part of Samba. For his PhD dissertation, he co-developed rsync, including the rsync algorithm, a highly efficient file transfer and File synchronization, synchronisation tool. He was also the original author of rzip, which uses a similar algorithm to rsync. He developed spamsum, based on locality-sensitive hashing algorithms. He is the author of KnightCap, a reinforcement l ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Open-source Software
Open-source software (OSS) is Software, computer software that is released under a Open-source license, license in which the copyright holder grants users the rights to use, study, change, and Software distribution, distribute the software and its source code to anyone and for any purpose. Open-source software may be developed in a collaborative, public manner. Open-source software is a prominent example of open collaboration, meaning any capable user is able to online collaboration, participate online in development, making the number of possible contributors indefinite. The ability to examine the code facilitates public trust in the software. Open-source software development can bring in diverse perspectives beyond those of a single company. A 2024 estimate of the value of open-source software to firms is $8.8 trillion, as firms would need to spend 3.5 times the amount they currently do without the use of open source software. Open-source code can be used for studying and a ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]