In
communication
Communication is commonly defined as the transmission of information. Its precise definition is disputed and there are disagreements about whether Intention, unintentional or failed transmissions are included and whether communication not onl ...
s and
information processing, code is a system of rules to convert
information
Information is an Abstraction, abstract concept that refers to something which has the power Communication, to inform. At the most fundamental level, it pertains to the Interpretation (philosophy), interpretation (perhaps Interpretation (log ...
—such as a
letter,
word
A word is a basic element of language that carries semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguist ...
, sound, image, or
gesture
A gesture is a form of nonverbal communication or non-vocal communication in which visible bodily actions communicate particular messages, either in place of, or in conjunction with, speech. Gestures include movement of the hands, face, or othe ...
—into another form, sometimes
shortened or
secret, for communication through a
communication channel
A communication channel refers either to a physical transmission medium such as a wire, or to a logical connection over a multiplexed medium such as a radio channel in telecommunications and computer networking. A channel is used for infor ...
or storage in a
storage medium. An early example is an invention of
language
Language is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and signed language, signed forms, and may also be conveyed through writing syste ...
, which enabled a person, through
speech
Speech is the use of the human voice as a medium for language. Spoken language combines vowel and consonant sounds to form units of meaning like words, which belong to a language's lexicon. There are many different intentional speech acts, suc ...
, to communicate what they thought, saw, heard, or felt to others. But speech limits the range of communication to the distance a voice can carry and limits the audience to those present when the speech is uttered. The invention of
writing
Writing is the act of creating a persistent representation of language. A writing system includes a particular set of symbols called a ''script'', as well as the rules by which they encode a particular spoken language. Every written language ...
, which converted spoken language into
visual symbol
A symbol is a mark, Sign (semiotics), sign, or word that indicates, signifies, or is understood as representing an idea, physical object, object, or wikt:relationship, relationship. Symbols allow people to go beyond what is known or seen by cr ...
s, extended the range of communication across space and
time
Time is the continuous progression of existence that occurs in an apparently irreversible process, irreversible succession from the past, through the present, and into the future. It is a component quantity of various measurements used to sequ ...
.
The process of encoding converts information from a
source into symbols for communication or storage. Decoding is the reverse process, converting code symbols back into a form that the recipient understands, such as English, Spanish, etc.
One reason for coding is to enable communication in places where ordinary
plain language, spoken or written, is difficult or impossible. For example,
semaphore, where the configuration of
flags
A flag is a piece of fabric (most often rectangular) with distinctive colours and design. It is used as a symbol, a signalling device, or for decoration. The term ''flag'' is also used to refer to the graphic design employed, and flags have ...
held by a signaler or the arms of a
semaphore tower encodes parts of the message, typically individual letters, and numbers. Another person standing a great distance away can interpret the flags and reproduce the words sent.
Theory
In
information theory
Information theory is the mathematical study of the quantification (science), quantification, Data storage, storage, and telecommunications, communication of information. The field was established and formalized by Claude Shannon in the 1940s, ...
and
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, a code is usually considered as an
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
that uniquely represents
symbols
A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise different concep ...
from some source
alphabet
An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
, by ''encoded'' strings, which may be in some other target alphabet. An extension of the code for representing sequences of symbols over the source alphabet is obtained by concatenating the encoded strings.
Before giving a mathematically precise definition, this is a brief example. The mapping
:
is a code, whose source alphabet is the set
and whose target alphabet is the set
. Using the extension of the code, the encoded string 0011001 can be grouped into codewords as 0 011 0 01, and these in turn can be decoded to the sequence of source symbols ''acab''.
Using terms from
formal language theory
In logic, mathematics, computer science, and linguistics, a formal language is a set of string (computer science), strings whose symbols are taken from a set called "#Definition, alphabet".
The alphabet of a formal language consists of symbol ...
, the precise mathematical definition of this concept is as follows: let S and T be two finite sets, called the source and target
alphabets
An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
, respectively. A code
is a
total function mapping each symbol from S to a
sequence of symbols over T. The extension
of
, is a
homomorphism of
into
, which naturally maps each sequence of source symbols to a sequence of target symbols.
Variable-length codes
In this section, we consider codes that encode each source (clear text) character by a
code word from some dictionary, and
concatenation
In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalizations of concatenati ...
of such code words give us an encoded string. Variable-length codes are especially useful when clear text characters have different probabilities; see also
entropy encoding.
A ''prefix code'' is a code with the "prefix property": there is no valid code word in the system that is a
prefix
A prefix is an affix which is placed before the stem of a word. Particularly in the study of languages, a prefix is also called a preformative, because it alters the form of the word to which it is affixed.
Prefixes, like other affixes, can b ...
(start) of any other valid code word in the set.
Huffman coding
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by ...
is the most known algorithm for deriving prefix codes. Prefix codes are widely referred to as "Huffman codes" even when the code was not produced by a Huffman algorithm. Other examples of prefix codes are
telephone country codes, the country and publisher parts of
ISBN
The International Standard Book Number (ISBN) is a numeric commercial book identifier that is intended to be unique. Publishers purchase or receive ISBNs from an affiliate of the International ISBN Agency.
A different ISBN is assigned to e ...
s, and the Secondary Synchronization Codes used in the
UMTS
The Universal Mobile Telecommunications System (UMTS) is a 3G mobile cellular system for networks based on the GSM standard. UMTS uses Wideband Code Division Multiple Access, wideband code-division multiple access (W-CDMA) radio access technolog ...
WCDMA 3G Wireless Standard.
Kraft's inequality characterizes the sets of codeword lengths that are possible in a prefix code. Virtually any uniquely decodable one-to-many code, not necessarily a prefix one, must satisfy Kraft's inequality.
Error-correcting codes
Codes may also be used to represent data in a way more resistant to errors in transmission or storage. This so-called
error-correcting code works by including carefully crafted redundancy with the stored (or transmitted) data. Examples include
Hamming codes,
Reed–Solomon,
Reed–Muller,
Walsh–Hadamard,
Bose–Chaudhuri–Hochquenghem,
Turbo
In an internal combustion engine, a turbocharger (also known as a turbo or a turbosupercharger) is a forced induction device that is powered by the flow of exhaust gases. It uses this energy to compress the intake air, forcing more air into the ...
,
Golay,
algebraic geometry codes,
low-density parity-check codes, and
space–time codes.
Error detecting codes can be optimised to detect ''burst errors'', or ''random errors''.
Examples
Codes in communication used for brevity
A cable code replaces words (e.g. ''ship'' or ''invoice'') with shorter words, allowing the same information to be sent with fewer
characters, more quickly, and less expensively.
Codes can be used for brevity. When
telegraph
Telegraphy is the long-distance transmission of messages where the sender uses symbolic codes, known to the recipient, rather than a physical exchange of an object bearing the message. Thus flag semaphore is a method of telegraphy, whereas ...
messages were the state of the art in rapid long-distance communication, elaborate systems of
commercial codes that encoded complete phrases into single mouths (commonly five-minute groups) were developed, so that telegraphers became conversant with such "words" as ''BYOXO'' ("Are you trying to weasel out of our deal?"), ''LIOUY'' ("Why do you not answer my question?"), ''BMULD'' ("You're a skunk!"), or ''AYYLU'' ("Not clearly coded, repeat more clearly.").
Code words were chosen for various reasons:
length
Length is a measure of distance. In the International System of Quantities, length is a quantity with Dimension (physical quantity), dimension distance. In most systems of measurement a Base unit (measurement), base unit for length is chosen, ...
,
pronounceability, etc. Meanings were chosen to fit perceived needs: commercial negotiations, military terms for military codes, diplomatic terms for diplomatic codes, any and all of the preceding for espionage codes. Codebooks and codebook publishers proliferated, including one run as a front for the American
Black Chamber run by
Herbert Yardley between the First and Second World Wars. The purpose of most of these codes was to save on cable costs. The use of data coding for
data compression predates the computer era; an early example is the telegraph
Morse code
Morse code is a telecommunications method which Character encoding, encodes Written language, text characters as standardized sequences of two different signal durations, called ''dots'' and ''dashes'', or ''dits'' and ''dahs''. Morse code i ...
where more-frequently used characters have shorter representations. Techniques such as
Huffman coding
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by ...
are now used by computer-based
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
s to compress large data files into a more compact form for storage or transmission.
Character encodings
Character encodings are representations of textual data. A given character encoding may be associated with a specific character set (the collection of characters which it can represent), though some character sets have multiple character encodings and vice versa. Character encodings may be broadly grouped according to the number of bytes required to represent a single character: there are single-byte encodings,
multibyte (also called wide) encodings, and
variable-width (also called variable-length) encodings. The earliest character encodings were single-byte, the best-known example of which is
ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
. ASCII remains in use today, for example in
HTTP headers. However, single-byte encodings cannot model character sets with more than 256 characters. Scripts that require large character sets such as
Chinese, Japanese and Korean must be represented with multibyte encodings. Early multibyte encodings were fixed-length, meaning that although each character was represented by more than one byte, all characters used the same number of bytes ("word length"), making them suitable for decoding with a lookup table. The final group, variable-width encodings, is a subset of multibyte encodings. These use more complex encoding and decoding logic to efficiently represent large character sets while keeping the representations of more commonly used characters shorter or maintaining backward compatibility properties. This group includes
UTF-8, an encoding of the
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
character set; UTF-8 is the most common encoding of text media on the Internet.
Genetic code
Biological organisms contain genetic material that is used to control their function and development. This is
DNA, which contains units named
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s from which
messenger RNA is derived. This in turn produces
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
s through a
genetic code
Genetic code is a set of rules used by living cell (biology), cells to Translation (biology), translate information encoded within genetic material (DNA or RNA sequences of nucleotide triplets or codons) into proteins. Translation is accomplished ...
in which a series of triplets (
codons) of four possible
nucleotides
Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
can be translated into one of twenty possible
amino acids. A sequence of codons results in a corresponding sequence of amino acids that form a protein molecule; a type of codon called a
stop codon signals the end of the sequence.
Gödel code
In
mathematics
Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
, a
Gödel code is the basis for the proof of
Gödel's
incompleteness theorem. Here, the idea is to map
mathematical notation to a
natural number
In mathematics, the natural numbers are the numbers 0, 1, 2, 3, and so on, possibly excluding 0. Some start counting with 0, defining the natural numbers as the non-negative integers , while others start with 1, defining them as the positive in ...
(using a
Gödel numbering).
Other
There are codes using colors, like
traffic lights, the
color code employed to mark the nominal value of the
electrical resistors or that of the trashcans devoted to specific types of garbage (paper, glass, organic, etc.).
In
marketing
Marketing is the act of acquiring, satisfying and retaining customers. It is one of the primary components of Business administration, business management and commerce.
Marketing is usually conducted by the seller, typically a retailer or ma ...
,
coupon codes can be used for a financial discount or rebate when purchasing a product from a (usual internet) retailer.
In military environments, specific sounds with the
cornet are used for different uses: to mark some moments of the day, to command the infantry on the battlefield, etc.
Communication systems for sensory impairments, such as
sign language
Sign languages (also known as signed languages) are languages that use the visual-manual modality to convey meaning, instead of spoken words. Sign languages are expressed through manual articulation in combination with #Non-manual elements, no ...
for deaf people and
braille
Braille ( , ) is a Tactile alphabet, tactile writing system used by blindness, blind or visually impaired people. It can be read either on embossed paper or by using refreshable braille displays that connect to computers and smartphone device ...
for blind people, are based on movement or tactile codes.
Musical scores are the most common way to encode
music
Music is the arrangement of sound to create some combination of Musical form, form, harmony, melody, rhythm, or otherwise Musical expression, expressive content. Music is generally agreed to be a cultural universal that is present in all hum ...
.
Specific games have their own code systems to record the matches, e.g.
chess notation.
Cryptography
In the
history of cryptography,
codes were once common for ensuring the confidentiality of communications, although
cipher
In cryptography, a cipher (or cypher) is an algorithm for performing encryption or decryption—a series of well-defined steps that can be followed as a procedure. An alternative, less common term is ''encipherment''. To encipher or encode i ...
s are now used instead.
Secret codes intended to obscure the real messages, ranging from serious (mainly
espionage
Espionage, spying, or intelligence gathering, as a subfield of the intelligence field, is the act of obtaining secret or confidential information ( intelligence). A person who commits espionage on a mission-specific contract is called an ...
in military, diplomacy, business, etc.) to trivial (romance, games) can be any kind of imaginative encoding:
flowers, game cards, clothes, fans, hats, melodies, birds, etc., in which the sole requirement is the pre-agreement on the meaning by both the sender and the receiver.
Other examples
Other examples of encoding include:
*Encoding (in
cognition
Cognition is the "mental action or process of acquiring knowledge and understanding through thought, experience, and the senses". It encompasses all aspects of intellectual functions and processes such as: perception, attention, thought, ...
) - a basic perceptual process of interpreting incoming stimuli; technically speaking, it is a complex, multi-stage process of converting relatively objective sensory input (e.g., light, sound) into a subjectively meaningful experience.
*A
content format - a specific encoding format for converting a specific type of
data
Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted for ...
to
information
Information is an Abstraction, abstract concept that refers to something which has the power Communication, to inform. At the most fundamental level, it pertains to the Interpretation (philosophy), interpretation (perhaps Interpretation (log ...
.
*Text encoding uses a
markup language
A markup language is a Encoding, text-encoding system which specifies the structure and formatting of a document and potentially the relationships among its parts. Markup can control the display of a document or enrich its content to facilitate au ...
to tag the structure and other features of a text to facilitate processing by computers. (See also
Text Encoding Initiative.)
*
Semantics encoding of formal language A informal language B is a method of representing all terms (e.g. programs or descriptions) of language A using language B.
*
Data compression transforms a signal into a code optimized for
transmission or
storage, generally done with a
codec.
*
Neural encoding - the way in which information is represented in
neurons.
*
Memory encoding - the process of converting sensations into memories.
*
Television encoding:
NTSC,
PAL and
SECAM
Other examples of decoding include:
*
Decoding (computer science)
*
Decoding methods, methods in communication theory for decoding codewords sent over a noisy channel
*
Digital signal processing
Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are a ...
, the study of signals in a digital representation and the processing methods of these signals
*
Digital-to-analog converter, the use of analog circuit for decoding operations
* Word decoding, the use of
phonics to decipher print patterns and translate them into the sounds of language
Codes and acronyms
Acronym
An acronym is a type of abbreviation consisting of a phrase whose only pronounced elements are the initial letters or initial sounds of words inside that phrase. Acronyms are often spelled with the initial Letter (alphabet), letter of each wor ...
s and abbreviations can be considered codes, and in a sense, all
language
Language is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and signed language, signed forms, and may also be conveyed through writing syste ...
s and
writing system
A writing system comprises a set of symbols, called a ''script'', as well as the rules by which the script represents a particular language. The earliest writing appeared during the late 4th millennium BC. Throughout history, each independen ...
s are codes for human thought.
International Air Transport Association airport codes are three-letter codes used to designate airports and used for
bag tags.
Station codes are similarly used on railways but are usually national, so the same code can be used for different stations if they are in different countries.
Occasionally, a code word achieves an independent existence (and meaning) while the original equivalent phrase is forgotten or at least no longer has the precise meaning attributed to the code word. For example, '30' was widely used in
journalism
Journalism is the production and distribution of reports on the interaction of events, facts, ideas, and people that are the "news of the day" and that informs society to at least some degree of accuracy. The word, a noun, applies to the journ ...
to mean "end of story", and has been used in
other contexts to signify "the end".
See also
*
ADDML
Archival Data Description Mark-up Language (ADDML) is a standard describing a collection of data files. The standard was originally developed by the National Archives of Norway (NAN), and existed in several different versions until a constant ...
*
Asemic writing
*
Cipher
In cryptography, a cipher (or cypher) is an algorithm for performing encryption or decryption—a series of well-defined steps that can be followed as a procedure. An alternative, less common term is ''encipherment''. To encipher or encode i ...
*
Code (semiotics)
In the broadest sense, a code is a (learnt, or arbitrary, or conventional) correspondence or rule between patterns. It can be an arrangement of physical matter, including the electromagnetic spectrum, that stores the potential (when activated) to ...
*
Cultural code
Cultural code refers to several related concepts about the body of shared practices, expectations and conventions specific to a given domain of a culture.
Under one interpretation, a cultural code is seen as defining a set of images that are asso ...
*
Equipment codes
*
Quantum error correction
*
Semiotics
Semiotics ( ) is the systematic study of sign processes and the communication of meaning. In semiotics, a sign is defined as anything that communicates intentional and unintentional meaning or feelings to the sign's interpreter.
Semiosis is a ...
*
Universal language
References
*
Further reading
* {{cite book , date=1963 , title=Codes and Abbreviations for the Use of the International Telecommunication Services , edition=2nd , location=Geneva, Switzerland , publisher=International Telecommunication Union , oclc=13677884
Signal processing