Locally Decodable Code
   HOME

TheInfoList



OR:

A locally decodable code (LDC) is an
error-correcting code In computing, telecommunication, information theory, and coding theory, forward error correction (FEC) or channel coding is a technique used for controlling errors in data transmission over unreliable or noisy communication channels. The centra ...
that allows a single bit of the original message to be decoded
with high probability In mathematics, an event that occurs with high probability (often shortened to w.h.p. or WHP) is one whose probability depends on a certain number ''n'' and goes to 1 as ''n'' goes to infinity, i.e. the probability of the event occurring can be m ...
by only examining (or querying) a small number of bits of a possibly corrupted codeword.Sergey Yekhanin. New locally decodable codes and private information retrieval schemes
Technical Report ECCC TR06-127
2006.
This property could be useful, say, in a context where information is being transmitted over a noisy channel, and only a small subset of the data is required at a particular time and there is no need to decode the entire message at once. Locally decodable codes are not a subset of locally testable codes, though there is some overlap between the two. Codewords are generated from the original message using an algorithm that introduces a certain amount of redundancy into the codeword; thus, the codeword is always longer than the original message. This redundancy is distributed across the codeword and allows the original message to be recovered with good probability even in the presence of errors. The more redundant the codeword, the more resilient it is against errors, and the fewer queries required to recover a bit of the original message.


Overview

More formally, a (q, \delta, \epsilon)-locally decodable code encodes an n-bit message x to an N-bit codeword C(x) such that any bit x_i of the message can be recovered with probability 1 - \epsilon by using a randomized decoding algorithm that queries only q bits of the codeword C(x), even if up to \delta N locations of the codeword have been corrupted. Furthermore, a perfectly smooth local decoder is a decoder such that, in addition to always generating the correct output given access to an uncorrupted codeword, for every j \in /math> and i \in /math> the j^ query to recover the i^ bit is uniform over /math>. (The notation /math> denotes the set \). Informally, this means that the set of queries required to decode any given bit are uniformly distributed over the codeword. Local list decoders are another interesting subset of local decoders. List decoding is useful when a codeword is corrupted in more than \delta/2 places, where \delta is the minimum
Hamming distance In information theory, the Hamming distance between two String (computer science), strings or vectors of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number ...
between two codewords. In this case, it is no longer possible to identify exactly which original message has been encoded, since there could be multiple codewords within \delta distance of the corrupted codeword. However, given a radius \epsilon, it is possible to identify the set of messages that encode to codewords that are within \epsilon of the corrupted codeword. An upper bound on the size of the set of messages can be determined by \delta and \epsilon. Locally decodable codes can also be concatenated, where a message is encoded first using one scheme, and the resulting codeword is encoded again using a different scheme. (Note that, in this context,
concatenation In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalizations of concatenati ...
is the term used by scholars to refer to what is usually called
composition Composition or Compositions may refer to: Arts and literature *Composition (dance), practice and teaching of choreography * Composition (language), in literature and rhetoric, producing a work in spoken tradition and written discourse, to include ...
; see ). This might be useful if, for example, the first code has some desirable properties with respect to rate, but it has some undesirable property, such as producing a codeword over a non-binary alphabet. The second code can then transform the result of the first encoding over a non-binary alphabet to a binary alphabet. The final encoding is still locally decodable, and requires additional steps to decode both layers of encoding.


Length of codeword and query complexity

The rate of a code refers to the ratio between its message length and codeword length: \frac, and the number of queries required to recover 1 bit of the message is called the query complexity of a code. The rate of a code is inversely related to the query complexity, but the exact shape of this tradeoff is a major
open problem In science and mathematics, an open problem or an open question is a known problem which can be accurately stated, and which is assumed to have an objective and verifiable solution, but which has not yet been solved (i.e., no solution for it is kno ...
. It is known that there are no LDCs that query the codeword in only one position, and that the optimal codeword size for query complexity 2 is exponential in the size of the original message. However, there are no known tight lower bounds for codes with query complexity greater than 2. Approaching the tradeoff from the side of codeword length, the only known codes with codeword length proportional to message length have query complexity k^\epsilon for \epsilon > 0 There are also codes in between, that have codewords polynomial in the size of the original message and polylogarithmic query complexity.


Applications

Locally decodable codes have applications to data transmission and storage, complexity theory, data structures, derandomization, theory of fault tolerant computation, and private information retrieval schemes.


Data transmission and storage

Locally decodable codes are especially useful for data transmission over noisy channels. The
Hadamard code The Hadamard code is an error-correcting code named after the French mathematician Jacques Hadamard that is used for error detection and correction when transmitting messages over very noisy or unreliable channels. In 1971, the code was used ...
(a special case of Reed Muller codes) was used in 1971 by
Mariner 9 Mariner 9 (Mariner Mars '71 / Mariner-I) was a robotic spacecraft that contributed greatly to the exploration of Mars and was part of the NASA Mariner program. Mariner 9 was launched toward Mars on May 30, 1971, from Spaceport Florida Launch Comp ...
to transmit pictures of Mars back to Earth. It was chosen over a 5-repeat code (where each bit is repeated 5 times) because, for roughly the same number of bits transmitted per pixel, it had a higher capacity for error correction. (The Hadamard code falls under the general umbrella of
forward error correction In computing, telecommunication, information theory, and coding theory, forward error correction (FEC) or channel coding is a technique used for controlling errors in data transmission over unreliable or noisy communication channels. The centra ...
, and just happens to be locally decodable; the actual algorithm used to decode the transmission from Mars was a generic error-correction scheme.) LDCs are also useful for data storage, where the medium may become partially corrupted over time, or the reading device is subject to errors. In both cases, an LDC will allow for the recovery of information despite errors, provided that there are relatively few. In addition, LDCs do not require that the entire original message be decoded; a user can decode a specific portion of the original message without needing to decode the entire thing.


Complexity theory

One of the applications of locally decodable codes in complexity theory is hardness amplification. Using LDCs with polynomial codeword length and polylogarithmic query complexity, one can take a function L: \^n \rightarrow \ that is hard to solve on worst case inputs and design a function L': \^N \rightarrow \ that is hard to compute on average case inputs. Consider L limited to only length t inputs. Then we can see L as a binary string of length 2^t, where each bit is L(x) for each x \in \^t. We can use a polynomial length locally decodable code C with polylogarithmic query complexity that tolerates some constant fraction of errors to encode the string that represents L to create a new string of length 2^ = 2^. We think of this new string as defining a new problem L' on length t' inputs. If L' is easy to solve on average, that is, we can solve L' correctly on a large fraction 1 - \epsilon of inputs, then by the properties of the LDC used to encode it, we can use L' to probabilistically compute L on all inputs. Thus, a solution to L' for most inputs would allow us to solve L on all inputs, contradicting our assumption that L is hard on worst case inputs.


Private information retrieval schemes

A
private information retrieval In cryptography, a private information retrieval (PIR) protocol is a protocol that allows a user to retrieve an item from a server in possession of a database without revealing which item is retrieved. PIR is a weaker version of 1-out-of-''n'' obl ...
scheme allows a user to retrieve an item from a server in possession of a database without revealing which item is retrieved. One common way of ensuring privacy is to have k separate, non-communicating servers, each with a copy of the database. Given an appropriate scheme, the user can make queries to each server that individually do not reveal which bit the user is looking for, but which together provide enough information that the user can determine the particular bit of interest in the database. One can easily see that locally decodable codes have applications in this setting. A general procedure to produce a k-server private information scheme from a perfectly smooth k-query locally decodable code is as follows: Let C be a perfectly smooth LDC that encodes n-bit messages to N-bit codewords. As a preprocessing step, each of the k servers S_1,\ldots ,S_k encodes the n-bit database x with the code C, so each server now stores the N-bit codeword C(x). A user interested in obtaining the i^ bit of x randomly generates a set of k queries q_1,\ldots q_k such that x_i can be computed from C(x)_, \ldots C(x)_ using the local decoding algorithm A for C. The user sends each query to a different server, and each server responds with the bit requested. The user then uses A to compute x_i from the responses. Because the decoding algorithm is perfectly smooth, each query q_j is uniformly distributed over the codeword; thus, no individual server can gain any information about the user's intentions, so the protocol is private as long as the servers do not communicate.


Examples


The Hadamard code

The
Hadamard Jacques Salomon Hadamard (; 8 December 1865 – 17 October 1963) was a French mathematician who made major contributions in number theory, complex analysis, differential geometry, and partial differential equations. Biography The son of a tea ...
(or Walsh-Hadamard) code is an example of a simple locally decodable code that maps a string of length k to a codeword of length 2^k. The codeword for a string x \in \^k is constructed as follows: for every a_j\in\^k, the j^ bit of the codeword is equal to x \odot a_j, where x \odot y = \sum\limits_^ x_y_i (mod 2). It is easy to see that every codeword has a
Hamming distance In information theory, the Hamming distance between two String (computer science), strings or vectors of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number ...
of \frac from every other codeword. The local decoding algorithm has query complexity 2, and the entire original message can be decoded with good probability if the codeword is corrupted in less than \frac of its bits. For \rho < \frac, if the codeword is corrupted in a \rho fraction of places, a local decoding algorithm can recover the i^ bit of the original message with probability 1 - 2\rho. Proof: Given a codeword H and an index i, the algorithm to recover the i^ bit of the original message x works as follows: Let e^j refer to the vector in \^k that has 1 in the j^ position and 0s elsewhere. For y \in \^k, f(y) denotes the single bit in H that corresponds to x \odot y. The algorithm chooses a random vector y \in \^k and the vector y' = y \oplus e^i (where \oplus denotes
bitwise XOR In computer programming, a bitwise operation operates on a bit string, a bit array or a binary numeral (considered as a bit string) at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic opera ...
). The algorithm outputs f(y) \oplus f(y') (mod 2). Correctness: By linearity, (x \odot y) \oplus (x \odot y') = (x \odot y) \oplus (x \odot (y \oplus e^i)) = (x \odot y) \oplus (x \odot y) \oplus (x \odot e^i) = x \odot e^i But (x \odot e^i) = x_i, so we just need to show that f(y) = x \odot y and f(y') = x \odot y' with good probability. Since y and y' are uniformly distributed (even though they are dependent), the
union bound In probability theory, Boole's inequality, also known as the union bound, says that for any finite or countable set of events, the probability that at least one of the events happens is no greater than the sum of the probabilities of the indiv ...
implies that f(y) = x \odot y and f(y') = x \odot y' with probability at least 1 - 2\rho. Note: to amplify the probability of success, one can repeat the procedure with different random vectors and take the majority answer.


The Reed–Muller code

The main idea behind local decoding of Reed-Muller codes is
polynomial interpolation In numerical analysis, polynomial interpolation is the interpolation of a given data set by the polynomial of lowest possible degree that passes through the points in the dataset. Given a set of data points (x_0,y_0), \ldots, (x_n,y_n), with no ...
. The key concept behind a Reed-Muller code is a multivariate polynomial of degree d on l variables. The message is treated as the evaluation of a polynomial at a set of predefined points. To encode these values, a polynomial is extrapolated from them, and the codeword is the evaluation of that polynomial on all possible points. At a high level, to decode a point of this polynomial, the decoding algorithm chooses a set S of points on a line that passes through the point of interest x. It then queries the codeword for the evaluation of the polynomial on points in S and interpolates that polynomial. Then it is simple to evaluate the polynomial at the point that will yield x. This roundabout way of evaluating x is useful because (a) the algorithm can be repeated using different lines through the same point to improve the probability of correctness, and (b) the queries are uniformly distributed over the codeword. More formally, let \mathbb be a
finite field In mathematics, a finite field or Galois field (so-named in honor of Évariste Galois) is a field (mathematics), field that contains a finite number of Element (mathematics), elements. As with any field, a finite field is a Set (mathematics), s ...
, and let l, d be numbers with d < , \mathbb, . The Reed-Muller code with parameters \mathbb, l, d is the function RM : \mathbb^ \rightarrow \mathbb^ that maps every l-variable polynomial P over \mathbb of total degree d to the values of P on all the inputs in \mathbb^l. That is, the input is a polynomial of the form P(x_1, \ldots, x_l) = \sum\limits_c_x_1^x_2^\cdots x_l^ specified by the interpolation of the \binom values of the predefined points and the output is the sequence \ for every x_1, \ldots, x_l \in \mathbb. To recover the value of a degree d polynomial at a point w \in \mathbb^n, the local decoder shoots a random
affine Affine may describe any of various topics concerned with connections or affinities. It may refer to: * Affine, a Affinity_(law)#Terminology, relative by marriage in law and anthropology * Affine cipher, a special case of the more general substi ...
line through w. Then it picks d + 1 points on that line, which it uses to interpolate the polynomial and then evaluate it at the point where the result is w. To do so, the algorithm picks a vector v \in \mathbb^n uniformly at random and considers the line L = \ through w. The algorithm picks an arbitrary subset S of \mathbb, where , S, = d+1, and queries coordinates of the codeword that correspond to points w+\lambda v for all \lambda \in S and obtains values \. Then it uses polynomial interpolation to recover the unique univariate polynomial h with degree less than or equal to d such that h(\lambda) = e_ for all \lambda \in S. Then, to get the value of w, it just evaluates h(0). To recover a single value of the original message, one chooses w to be one of the points that defines the polynomial. Each individual query is distributed uniformly at random over the codeword. Thus, if the codeword is corrupted in at most a \delta fraction of locations, by the union bound, the probability that the algorithm samples only uncorrupted coordinates (and thus correctly recovers the bit) is at least 1 - (d+1)\delta. For other decoding algorithms, see.


See also

*
Private information retrieval In cryptography, a private information retrieval (PIR) protocol is a protocol that allows a user to retrieve an item from a server in possession of a database without revealing which item is retrieved. PIR is a weaker version of 1-out-of-''n'' obl ...
*
Linear cryptanalysis In cryptography, linear cryptanalysis is a general form of cryptanalysis based on finding affine Affine may describe any of various topics concerned with connections or affinities. It may refer to: * Affine, a Affinity_(law)#Terminology, relat ...


References

{{reflist Error detection and correction