In
coding theory
Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and data storage. Codes are stud ...
, block codes are a large and important family of
error-correcting codes
In computing, telecommunication, information theory, and coding theory, an error correction code, sometimes error correcting code, (ECC) is used for controlling errors in data over unreliable or noisy communication channels. The central idea is ...
that encode data in blocks.
There is a vast number of examples for block codes, many of which have a wide range of practical applications. The abstract definition of block codes is conceptually useful because it allows coding theorists,
mathematician
A mathematician is someone who uses an extensive knowledge of mathematics in their work, typically to solve mathematical problems.
Mathematicians are concerned with numbers, data, quantity, mathematical structure, structure, space, Mathematica ...
s, and
computer scientists to study the limitations of ''all'' block codes in a unified way.
Such limitations often take the form of ''bounds'' that relate different parameters of the block code to each other, such as its rate and its ability to detect and correct errors.
Examples of block codes are
Reed–Solomon codes,
Hamming code
In computer science and telecommunication, Hamming codes are a family of linear error-correcting codes. Hamming codes can detect one-bit and two-bit errors, or correct one-bit errors without detection of uncorrected errors. By contrast, the s ...
s,
Hadamard code
The Hadamard code is an error-correcting code named after Jacques Hadamard that is used for error detection and correction when transmitting messages over very noisy or unreliable channels. In 1971, the code was used to transmit photos of Mar ...
s,
Expander code
In coding theory, expander codes form a class of error-correcting codes that are constructed from bipartite expander graphs.
Along with Justesen codes, expander codes are of particular interest since they have a constant positive rate, a const ...
s,
Golay codes, and
Reed–Muller code
Reed–Muller codes are error-correcting codes that are used in wireless communications applications, particularly in deep-space communication. Moreover, the proposed 5G standard relies on the closely related polar codes for error correction in ...
s. These examples also belong to the class of
linear code In coding theory, a linear code is an error-correcting code for which any linear combination of codewords is also a codeword. Linear codes are traditionally partitioned into block codes and convolutional codes, although turbo codes can be seen a ...
s, and hence they are called linear block codes. More particularly, these codes are known as algebraic block codes, or cyclic block codes, because they can be generated using boolean polynomials.
Algebraic block codes are typically
hard-decoded using algebraic decoders.
The term ''block code'' may also refer to any error-correcting code that acts on a block of
bits of input data to produce
bits of output data
. Consequently, the block coder is a ''memoryless'' device. Under this definition codes such as
turbo code
In information theory, turbo codes (originally in French ''Turbocodes'') are a class of high-performance forward error correction (FEC) codes developed around 1990–91, but first published in 1993. They were the first practical codes to closely ...
s, terminated convolutional codes and other iteratively decodable codes (turbo-like codes) would also be considered block codes. A non-terminated convolutional encoder would be an example of a non-block (unframed) code, which has ''memory'' and is instead classified as a ''tree code''.
This article deals with "algebraic block codes".
The block code and its parameters
Error-correcting code
In computing, telecommunication, information theory, and coding theory, an error correction code, sometimes error correcting code, (ECC) is used for controlling errors in data over unreliable or noisy communication channels. The central idea i ...
s are used to
reliably transmit
digital data
Digital data, in information theory and information systems, is information represented as a string of discrete symbols each of which can take on one of only a finite number of values from some alphabet, such as letters or digits. An example is ...
over unreliable
communication channel
A communication channel refers either to a physical transmission medium such as a wire, or to a logical connection over a multiplexed medium such as a radio channel in telecommunications and computer networking. A channel is used for inform ...
s subject to
channel noise
In electronics, noise is an unwanted disturbance in an electrical signal.
Noise generated by electronic devices varies greatly as it is produced by several different effects.
In particular, noise is inherent in physics, and central to the ...
.
When a sender wants to transmit a possibly very long data stream using a block code, the sender breaks the stream up into pieces of some fixed size. Each such piece is called ''message'' and the procedure given by the block code encodes each message individually into a codeword, also called a ''block'' in the context of block codes. The sender then transmits all blocks to the receiver, who can in turn use some decoding mechanism to (hopefully) recover the original messages from the possibly corrupted received blocks.
The performance and success of the overall transmission depends on the parameters of the channel and the block code.
Formally, a block code is an
injective
In mathematics, an injective function (also known as injection, or one-to-one function) is a function that maps distinct elements of its domain to distinct elements; that is, implies . (Equivalently, implies in the equivalent contraposi ...
mapping
:
.
Here,
is a finite and nonempty
set and
and
are integers. The meaning and significance of these three parameters and other parameters related to the code are described below.
The alphabet Σ
The data stream to be encoded is modeled as a
string
String or strings may refer to:
*String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects
Arts, entertainment, and media Films
* ''Strings'' (1991 film), a Canadian anim ...
over some alphabet
. The size
of the alphabet is often written as
. If
, then the block code is called a ''binary'' block code. In many applications it is useful to consider
to be a
prime power
In mathematics, a prime power is a positive integer which is a positive integer power of a single prime number.
For example: , and are prime powers, while
, and are not.
The sequence of prime powers begins:
2, 3, 4, 5, 7, 8, 9, 11, 13, 16, ...
, and to identify
with the
finite field
In mathematics, a finite field or Galois field (so-named in honor of Évariste Galois) is a field that contains a finite number of elements. As with any field, a finite field is a set on which the operations of multiplication, addition, subt ...
.
The message length ''k''
Messages are elements
of
, that is, strings of length
.
Hence the number
is called the message length or dimension of a block code.
The block length ''n''
The block length
of a block code is the number of symbols in a block. Hence, the elements
of
are strings of length
and correspond to blocks that may be received by the receiver. Hence they are also called received words.
If
for some message
, then
is called the codeword of
.
The rate ''R''
The rate of a block code is defined as the ratio between its message length and its block length:
:
.
A large rate means that the amount of actual message per transmitted block is high. In this sense, the rate measures the transmission speed and the quantity
measures the overhead that occurs due to the encoding with the block code.
It is a simple
information theoretical fact that the rate cannot exceed
since data cannot in general be losslessly compressed. Formally, this follows from the fact that the code
is an injective map.
The distance ''d''
The distance or minimum distance of a block code is the minimum number of positions in which any two distinct codewords differ, and the relative distance
is the fraction
.
Formally, for received words
, let
denote the
Hamming distance
In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chang ...
between
and
, that is, the number of positions in which
and
differ.
Then the minimum distance
of the code
is defined as
:
(m_1),C(m_2)
M, or m, is the thirteenth letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''em'' (pronounced ), plural ''ems''.
History
Th ...
/math>.
Since any code has to be
injective
In mathematics, an injective function (also known as injection, or one-to-one function) is a function that maps distinct elements of its domain to distinct elements; that is, implies . (Equivalently, implies in the equivalent contraposi ...
, any two codewords will disagree in at least one position, so the distance of any code is at least
. Besides, the distance equals the
minimum weight for linear block codes because:
:
(m_1),C(m_2)
M, or m, is the thirteenth letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''em'' (pronounced ), plural ''ems''.
History
Th ...
= \min_ \Delta
mathbf,C(m_1)+C(m_2)= \min_ w
(m)
The Moderate Party ( sv, Moderata samlingspartiet , ; M), commonly referred to as the Moderates ( ), is a liberal-conservative political party in Sweden. The party generally supports tax cuts, the free market, civil liberties and economic liber ...
= w_\min.
A larger distance allows for more error correction and detection.
For example, if we only consider errors that may change symbols of the sent codeword but never erase or add them, then the number of errors is the number of positions in which the sent codeword and the received word differ.
A code with distance allows the receiver to detect up to
transmission errors since changing
positions of a codeword can never accidentally yield another codeword. Furthermore, if no more than
transmission errors occur, the receiver can uniquely decode the received word to a codeword. This is because every received word has at most one codeword at distance
. If more than
transmission errors occur, the receiver cannot uniquely decode the received word in general as there might be several possible codewords. One way for the receiver to cope with this situation is to use
list decoding
In coding theory, list decoding is an alternative to unique decoding of error-correcting codes for large error rates. The notion was proposed by Elias in the 1950s. The main idea behind list decoding is that the decoding algorithm instead of outp ...
, in which the decoder outputs a list of all codewords in a certain radius.
Popular notation
The notation
describes a block code over an alphabet
of size
, with a block length
, message length
, and distance
.
If the block code is a linear block code, then the square brackets in the notation
are used to represent that fact.
For binary codes with
, the index is sometimes dropped.
For
maximum distance separable code
In coding theory, the Singleton bound, named after Richard Collom Singleton, is a relatively crude upper bound on the size of an arbitrary block code C with block length n, size M and minimum distance d. It is also known as the Joshibound. proved b ...
s, the distance is always
, but sometimes the precise distance is not known, non-trivial to prove or state, or not needed. In such cases, the
-component may be missing.
Sometimes, especially for non-block codes, the notation
is used for codes that contain
codewords of length
. For block codes with messages of length
over an alphabet of size
, this number would be
.
Examples
As mentioned above, there are a vast number of error-correcting codes that are actually block codes.
The first error-correcting code was the
Hamming(7,4)
In coding theory, Hamming(7,4) is a linear error-correcting code that encodes four bits of data into seven bits by adding three parity bits. It is a member of a larger family of Hamming codes, but the term ''Hamming code'' often refers to this ...
code, developed by
Richard W. Hamming
Richard Wesley Hamming (February 11, 1915 – January 7, 1998) was an American mathematician whose work had many implications for computer engineering and telecommunications. His contributions include the Hamming code (which makes use of a Ha ...
in 1950. This code transforms a message consisting of 4 bits into a codeword of 7 bits by adding 3 parity bits. Hence this code is a block code. It turns out that it is also a linear code and that it has distance 3. In the shorthand notation above, this means that the Hamming(7,4) code is a
code.
Reed–Solomon codes are a family of
codes with
and
being a
prime power
In mathematics, a prime power is a positive integer which is a positive integer power of a single prime number.
For example: , and are prime powers, while
, and are not.
The sequence of prime powers begins:
2, 3, 4, 5, 7, 8, 9, 11, 13, 16, ...
.
Rank codes are family of
codes with
.
Hadamard code
The Hadamard code is an error-correcting code named after Jacques Hadamard that is used for error detection and correction when transmitting messages over very noisy or unreliable channels. In 1971, the code was used to transmit photos of Mar ...
s are a family of
codes with
and
.
Error detection and correction properties
A codeword
could be considered as a point in the
-dimension space
and the code
is the subset of
. A code
has distance
means that
, there is no other codeword in the ''Hamming ball'' centered at
with radius
, which is defined as the collection of
-dimension words whose ''
Hamming distance
In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chang ...
'' to
is no more than
. Similarly,
with (minimum) distance
has the following properties:
*
can detect
errors : Because a codeword
is the only codeword in the Hamming ball centered at itself with radius
, no error pattern of
or fewer errors could change one codeword to another. When the receiver detects that the received vector is not a codeword of
, the errors are detected (but no guarantee to correct).
*
can correct
errors. Because a codeword
is the only codeword in the Hamming ball centered at itself with radius
, the two Hamming balls centered at two different codewords respectively with both radius
do not overlap with each other. Therefore, if we consider the error correction as finding the codeword closest to the received word
, as long as the number of errors is no more than
, there is only one codeword in the hamming ball centered at
with radius
, therefore all errors could be corrected.
* In order to decode in the presence of more than
errors,
list-decoding
In coding theory, list decoding is an alternative to unique decoding of error-correcting codes for large error rates. The notion was proposed by Elias in the 1950s. The main idea behind list decoding is that the decoding algorithm instead of outpu ...
or
maximum likelihood decoding
In coding theory, decoding is the process of translating received messages into codewords of a given code. There have been many common methods of mapping messages to codewords. These are often used to recover messages sent over a noisy channel, s ...
can be used.
*
can correct
erasures. By ''erasure'' it means that the position of the erased symbol is known. Correcting could be achieved by
-passing decoding : In
passing the erased position is filled with the
symbol and error correcting is carried out. There must be one passing that the number of errors is no more than
and therefore the erasures could be corrected.
Lower and upper bounds of block codes
Family of codes
is called '' family of codes'', where
is an
code with monotonic increasing
.
Rate of family of codes is defined as
Relative distance of family of codes is defined as
To explore the relationship between
and
, a set of lower and upper bounds of block codes are known.
Hamming bound
:
Singleton bound
The Singleton bound is that the sum of the rate and the relative distance of a block code cannot be much larger than 1:
:
.
In other words, every block code satisfies the inequality
.
Reed–Solomon codes are non-trivial examples of codes that satisfy the singleton bound with equality.
Plotkin bound
For
,
. In other words,
.
For the general case, the following Plotkin bounds holds for any
with distance :
# If
# If
For any -ary code with distance
,
Gilbert–Varshamov bound
, where
,
is the -ary entropy function.
Johnson bound
Define
.
Let
be the maximum number of codewords in a Hamming ball of radius for any code
of distance .
Then we have the ''Johnson Bound'' :
, if
Elias–Bassalygo bound
:
Sphere packings and lattices
Block codes are tied to the
sphere packing problem
In geometry, a sphere packing is an arrangement of non-overlapping spheres within a containing space. The spheres considered are usually all of identical size, and the space is usually three-dimensional Euclidean space. However, sphere packing p ...
which has received some attention over the years. In two dimensions, it is easy to visualize. Take a bunch of pennies flat on the table and push them together. The result is a hexagon pattern like a bee's nest. But block codes rely on more dimensions which cannot easily be visualized. The powerful
Golay code used in deep space communications uses 24 dimensions. If used as a binary code (which it usually is), the dimensions refer to the length of the codeword as defined above.
The theory of coding uses the ''N''-dimensional sphere model. For example, how many pennies can be packed into a circle on a tabletop or in 3 dimensions, how many marbles can be packed into a globe. Other considerations enter the choice of a code. For example, hexagon packing into the constraint of a rectangular box will leave empty space at the corners. As the dimensions get larger, the percentage of empty space grows smaller. But at certain dimensions, the packing uses all the space and these codes are the so-called perfect codes. There are very few of these codes.
Another property is the number of neighbors a single codeword may have.
[
]
Again, consider pennies as an example. First we pack the pennies in a rectangular grid. Each penny will have 4 near neighbors (and 4 at the corners which are farther away). In a hexagon, each penny will have 6 near neighbors. Respectively, in three and four dimensions, the maximum packing is given by the
12-face
In geometry, a dodecahedron (Greek , from ''dōdeka'' "twelve" + ''hédra'' "base", "seat" or "face") or duodecahedron is any polyhedron with twelve flat faces. The most familiar dodecahedron is the regular dodecahedron with regular pentagon ...
and
24-cell
In geometry, the 24-cell is the convex regular 4-polytope (four-dimensional analogue of a Platonic solid) with Schläfli symbol . It is also called C24, or the icositetrachoron, octaplex (short for "octahedral complex"), icosatetrahedroid, oc ...
with 12 and 24 neighbors, respectively. When we increase the dimensions, the number of near neighbors increases very rapidly. In general, the value is given by the
kissing number
In geometry, the kissing number of a mathematical space is defined as the greatest number of non-overlapping unit spheres that can be arranged in that space such that they each touch a common unit sphere. For a given sphere packing (arrangement o ...
s.
The result is that the number of ways for noise to make the receiver choose
a neighbor (hence an error) grows as well. This is a fundamental limitation
of block codes, and indeed all codes. It may be harder to cause an error to
a single neighbor, but the number of neighbors can be large enough so the
total error probability actually suffers.
[
]
See also
* Channel capacity
* Shannon–Hartley theorem
In information theory, the Shannon–Hartley theorem tells the maximum rate at which information can be transmitted over a communications channel of a specified bandwidth in the presence of noise. It is an application of the noisy-channel coding ...
* Noisy channel
* List decoding
In coding theory, list decoding is an alternative to unique decoding of error-correcting codes for large error rates. The notion was proposed by Elias in the 1950s. The main idea behind list decoding is that the decoding algorithm instead of outp ...
[
* ]Sphere packing
In geometry, a sphere packing is an arrangement of non-overlapping spheres within a containing space. The spheres considered are usually all of identical size, and the space is usually three- dimensional Euclidean space. However, sphere pack ...
References
*
*
*
* {{cite book , author=S. Lin , author2=D. J. Jr. Costello , title= Error Control Coding: Fundamentals and Applications , publisher=Prentice-Hall , year=1983 , isbn=0-13-283796-X
External links
* Charan Langton (2001
Coding Concepts and Block Coding
Coding theory