HOME

TheInfoList



OR:

Convergent encryption, also known as content hash keying, is a
cryptosystem In cryptography, a cryptosystem is a suite of cryptographic algorithms needed to implement a particular security service, such as confidentiality (encryption). Typically, a cryptosystem consists of three algorithms: one for key generation, one f ...
that produces identical
ciphertext In cryptography, ciphertext or cyphertext is the result of encryption performed on plaintext using an algorithm, called a cipher. Ciphertext is also known as encrypted or encoded information because it contains a form of the original plaintext ...
from identical
plaintext In cryptography, plaintext usually means unencrypted information pending input into cryptographic algorithms, usually encryption algorithms. This usually refers to data that is transmitted or stored unencrypted. Overview With the advent of com ...
files. This has applications in
cloud computing Cloud computing is the on-demand availability of computer system resources, especially data storage ( cloud storage) and computing power, without direct active management by the user. Large clouds often have functions distributed over m ...
to remove duplicate files from storage without the provider having access to the encryption keys. The combination of deduplication and convergent encryption was described in a backup system patent filed by
Stac Electronics Stac Electronics, originally incorporated as State of the Art Consulting and later shortened to Stac, Inc., was a technology company founded in 1983. It is known primarily for its Lempel–Ziv–Stac lossless compression algorithm and Stacker dis ...
in 1995. This combination has been used by Farsite,
Permabit Permabit Technology Corporation was a private supplier of Data Reduction solutions to the Computer Data Storage industry. On 31 July 2017 it was announced that Red Hat had acquired the assets and technology of Permabit Technology Corporation. Pe ...
,
Freenet Freenet is a peer-to-peer platform for censorship-resistant, anonymous communication. It uses a decentralized distributed data store to keep and deliver information, and has a suite of free software for publishing and communicating on the Web ...
,
MojoNation Mnet is software to run a peer-to-peer distributed data store for file sharing purpose. It aimed at substituting free sharing of digital resources, typical of P2P networks, with a market regulated by its own currency. Mnet is a fork of the softw ...
,
GNUnet GNUnet is a software framework for decentralized, peer-to-peer networking and an official GNU package. The framework offers link encryption, peer discovery, resource allocation, communication over many transports (such as TCP, UDP, HTTP, ...
, flud, and the Tahoe Least-Authority File Store. The system gained additional visibility in 2011 when cloud storage provider Bitcasa announced they were using convergent encryption to enable de-duplication of data in their
cloud storage Cloud storage is a model of computer data storage in which the digital data is stored in logical pools, said to be on "the cloud". The physical storage spans multiple servers (sometimes in multiple locations), and the physical environment is ty ...
service.


Overview

# The system computes a
cryptographic hash A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for cryptography: * the probability of a particular n-bit output ...
of the plaintext in question. # The system then encrypts the plaintext by using the hash as a key. # Finally, the hash itself is stored, encrypted with a key chosen by the user.


Known Attacks

Convergent encryption is open to a "confirmation of a file attack" in which an attacker can effectively confirm whether a target possesses a certain file by encrypting an unencrypted, or
plain-text In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.). It may also include a limit ...
, version and then simply comparing the output with files possessed by the target. This attack poses a problem for a user storing information that is non-unique, i.e. also either publicly available or already held by the adversary - for example:
banned books This is an index of lists of banned books, which contain books that have been banned or censored by religious authority or government. By country * Book censorship in Canada * Book censorship in China * List of books banned in India * Boo ...
or files that cause
copyright infringement Copyright infringement (at times referred to as piracy) is the use of works protected by copyright without permission for a usage where such permission is required, thereby infringing certain exclusive rights granted to the copyright holder, ...
. An argument could be made that a confirmation of a file attack is rendered less effective by adding a unique piece of data such as a few random characters to the plain text before encryption; this causes the uploaded file to be unique and therefore results in a unique encrypted file. However, some implementations of convergent encryption where the plain-text is broken down into blocks based on file content, and each block then independently convergently encrypted may inadvertently defeat attempts at making the file unique by adding bytes at the beginning or end. Even more alarming than the confirmation attack is the "learn the remaining information attack" described by Drew Perttula in 2008.
tahoe-lafs.org (2008-08-20). Retrieved on 2013-09-05. This type of attack applies to the encryption of files that are only slight variations of a public document. For example, if the defender encrypts a bank form including a ten digit bank account number, an attacker that is aware of generic bank form format may extract defender's bank account number by producing bank forms for all possible bank account numbers, encrypt them and then by comparing those encryptions with defender's encrypted file deduce the bank account number. Note that this attack can be extended to attack a large number of targets at once (all spelling variations of a target bank customer in the example above, or even all potential bank customers), and the presence of this problem extends to any type of form document: tax returns, financial documents, healthcare forms, employment forms, etc. Also note that there is no known method for decreasing the severity of this attack -- adding a few random bytes to files as they are stored does not help, since those bytes can likewise be attacked with the "learn the remaining information" approach. The only effective approach to mitigating this attack is to encrypt the contents of files with a non-convergent secret before storing (negating any benefit from convergent encryption), or to simply not use convergent encryption in the first place.


See also

*
Salt (cryptography) In cryptography, a salt is random data that is used as an additional input to a one-way function that hashes data, a password or passphrase. Salts are used to safeguard passwords in storage. Historically, only the output from an invocation ...
*
Deterministic encryption A deterministic encryption scheme (as opposed to a probabilistic encryption scheme) is a cryptosystem which always produces the same ciphertext for a given plaintext and key, even over separate executions of the encryption algorithm. Examples of ...


References

{{Reflist, 30em Cryptography