This is a list of some binary codes that are (or have been) used to represent
text
Text may refer to:
Written word
* Text (literary theory), any object that can be read, including:
**Religious text, a writing that a religious tradition considers to be sacred
**Text, a verse or passage from scripture used in expository preachin ...
as a sequence of
binary digits "0" and "1". Fixed-width binary codes use a set number of bits to represent each character in the text, while in
variable-width binary codes, the number of bits may vary from character to character.
the binary codes are used to read the computer language.
Five-bit binary codes
Several different five-bit codes were used for early
punched tape
Five- and eight-hole punched paper tape
Paper tape reader on the Harwell computer with a small piece of five-hole tape connected in a circle – creating a physical program loop
Punched tape or perforated paper tape is a form of data storage ...
systems.
Five bits per character only allows for 32 different characters, so many of the five-bit codes used two sets of characters per value referred to as FIGS (figures) and LTRS (letters), and reserved two characters to switch between these sets. This effectively allowed the use of 60 characters.
Standard five-bit standard codes are:
*
International Telegraph Alphabet No. 1
The Baudot code is an early character encoding for telegraphy invented by Émile Baudot in the 1870s. It was the predecessor to the International Telegraph Alphabet No. 2 (ITA2), the most common teleprinter code in use until the advent of AS ...
(ITA1) – Also commonly referred to as
Baudot code
*
International Telegraph Alphabet No. 2
The Baudot code is an early character encoding for telegraphy invented by Émile Baudot in the 1870s. It was the predecessor to the International Telegraph Alphabet No. 2 (ITA2), the most common teleprinter code in use until the advent of ASCII. ...
(ITA2) – Also commonly referred to as
Murray code
The Baudot code is an early character encoding for telegraphy invented by Émile Baudot in the 1870s. It was the predecessor to the International Telegraph Alphabet No. 2 (ITA2), the most common teleprinter code in use until the advent of ASCII. ...
*
American Teletypewriter code (USTTY) – A variant of ITA2 used in the USA
*
DIN 66006 – Developed for the presentation of
ALGOL
ALGOL (; short for "Algorithmic Language") is a family of imperative computer programming languages originally developed in 1958. ALGOL heavily influenced many other languages and was the standard method for algorithm description used by th ...
/
ALCOR
ALCOR (ALGOL Converter, acronym) is an early computer language definition created by the ALCOR Group, a consortium of universities, research institutions and manufacturers in Europe and the United States which was founded in 1959 and which had 60 m ...
programs on paper tape and punch cards
The following early computer systems each used its own five-bit code:
*
J. Lyons and Co. LEO
Leo or Léo may refer to:
Acronyms
* Law enforcement officer
* Law enforcement organisation
* ''Louisville Eccentric Observer'', a free weekly newspaper in Louisville, Kentucky
* Michigan Department of Labor and Economic Opportunity
Arts an ...
(Lyon's Electronic Office)
*
English Electric
N.º UIC: 9094 110 1449-3 (Takargo Rail)
The English Electric Company Limited (EE) was a British industrial manufacturer formed after the Armistice of 11 November 1918, armistice of World War I by amalgamating five businesses which, during t ...
DEUCE
Deuce, Deuces, or The Deuce may refer to:
Arts and entertainment Fictional characters
* Deuce, in the '' Danger Girl'' comic book series
* Deuce, a character in ''Shake It Up''
* Deuce, in the '' Wild Cards'' science fiction universe
* Deuce Bi ...
*
University of Illinois at Urbana-Champaign
The University of Illinois Urbana-Champaign (U of I, Illinois, University of Illinois, or UIUC) is a public land-grant research university in Illinois in the twin cities of Champaign and Urbana. It is the flagship institution of the Uni ...
ILLIAC
ILLIAC (Illinois Automatic Computer) was a series of supercomputers built at a variety of locations, some at the University of Illinois at Urbana–Champaign. In all, five computers were built in this series between 1951 and 1974. Some more modern ...
*
ZEBRA
Zebras (, ) (subgenus ''Hippotigris'') are African equines with distinctive black-and-white striped coats. There are three living species: the Grévy's zebra (''Equus grevyi''), plains zebra (''E. quagga''), and the mountain zebra (''E. ...
*
EMI 1100
* Ferranti
Mercury,
Pegasus
Pegasus ( grc-gre, Πήγασος, Pḗgasos; la, Pegasus, Pegasos) is one of the best known creatures in Greek mythology. He is a winged divine stallion usually depicted as pure white in color. He was sired by Poseidon, in his role as hor ...
, and
Orion systems
The steganographic code, commonly known as
Bacon's cipher uses groups of 5 binary-valued elements to represent letters of the alphabet.
Six-bit binary codes
Six bits per character allows 64 distinct characters to be represented.
Examples of six-bit binary codes are:
*
International Telegraph Alphabet No. 4
International is an adjective (also used as a noun) meaning "between nations".
International may also refer to:
Music Albums
* ''International'' (Kevin Michael album), 2011
* ''International'' (New Order album), 2002
* ''International'' (The T ...
(
ITA4)
*
Six-bit BCD (Binary Coded Decimal), used by early
mainframe
A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for tasks such as censuses, industry and consumer statistics, enterpris ...
computers.
*
Six-bit ASCII subset of the primitive
seven-bit ASCII
*
Braille
Braille (Pronounced: ) is a tactile writing system used by people who are visually impaired, including people who are blind, deafblind or who have low vision. It can be read either on embossed paper or by using refreshable braille display ...
– Braille characters are represented using six dot positions, arranged in a rectangle. Each position may contain a raised dot or not, so Braille can be considered to be a six-bit binary code.
See also:
Six-bit character codes
Seven-bit binary codes
Examples of seven-bit binary codes are:
*
International Telegraph Alphabet No. 3
International is an adjective (also used as a noun) meaning "between nations".
International may also refer to:
Music Albums
* ''International'' (Kevin Michael album), 2011
* ''International'' (New Order album), 2002
* ''International'' (The T ...
(
ITA3) – derived from the Moore ARQ code, and also known as the RCA
*
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
– The ubiquitous ASCII code was originally defined as a seven-bit character set. The ASCII article provides a detailed set of equivalent standards and variants. In addition, there are various extensions of ASCII to eight bits (see
Eight-bit binary codes)
* CCIR 476 – Extends ITA2 from 5 to 7 bits, using the extra 2 bits as
check digits
*
International Telegraph Alphabet No. 4
International is an adjective (also used as a noun) meaning "between nations".
International may also refer to:
Music Albums
* ''International'' (Kevin Michael album), 2011
* ''International'' (New Order album), 2002
* ''International'' (The T ...
(
ITA4)
Eight-bit binary codes
*
Extended ASCII
Extended ASCII is a repertoire of character encodings that include (most of) the original 96 ASCII character set, plus up to 128 additional characters. There is no formal definition of "extended ASCII", and even use of the term is sometimes critic ...
– A number of standards extend ASCII to eight bits by adding a further 128 characters, such as:
**
HP Roman
**
ISO/IEC 8859
**
Mac OS Roman
**
Windows-1252
Windows-1252 or CP-1252 ( code page 1252) is a single-byte character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows for English and many European languages including Spanish, French, and German.
It ...
*
EBCDIC
Extended Binary Coded Decimal Interchange Code (EBCDIC; ) is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding s ...
– Used in early
IBM computers and current
IBM i
IBM i (the ''i'' standing for ''integrated'') is an operating system developed by IBM for IBM Power Systems. It was originally released in 1988 as OS/400, as the sole operating system of the IBM AS/400 line of systems. It was renamed to i5/OS i ...
and
System z systems.
10-bit binary codes
*AUTOSPEC – Also known as Bauer code. AUTOSPEC repeats a five-bit character twice, but if the character has odd parity, the repetition is inverted.
*
Decabit – A datagram of electronic pulses which are transmitted commonly through power lines. Decabit is mainly used in Germany and other European countries.
16-bit binary codes
*
UCS-2
The Universal Coded Character Set (UCS, Unicode) is a standard set of characters defined by the international standard ISO/IEC 10646, ''Information technology — Universal Coded Character Set (UCS)'' (plus amendments to that standard), whi ...
– An obsolete encoding capable of representing the
basic multilingual plane
In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecim ...
of Unicode
32-bit binary codes
*
UTF-32/UCS-4 – A four-bytes-per-character representation of
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
Variable-length binary codes
*
UTF-8
UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''.
UTF-8 is capable of ...
– Encodes characters in a way that is mostly compatible with
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
but can also encode the full repertoire of Unicode characters with sequences of up to four 8-bit bytes.
*
UTF-16
UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as cod ...
– Extends UCS-2 to cover the whole of Unicode with sequences of one or two 16-bit elements
*
GB 18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
– A full-Unicode variable-length code designed for compatibility with older Chinese multibyte encodings
*
Huffman coding
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algor ...
– A technique for expressing more common characters using shorter bit strings than are used for less common characters
Data compression
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...
systems such as
Lempel–Ziv–Welch
Lempel–Ziv–Welch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempe ...
can compress arbitrary binary data. They are therefore not binary codes themselves but may be applied to binary codes to reduce storage needs
Other
*
Morse code
Morse code is a method used in telecommunication to encode text characters as standardized sequences of two different signal durations, called ''dots'' and ''dashes'', or ''dits'' and ''dahs''. Morse code is named after Samuel Morse, one ...
is a variable-length telegraphy code, which traditionally uses a series of long and short pulses to encode characters. It relies on gaps between the pulses to provide separation between letters and words, as the letter codes do not have the
"prefix property". This means that Morse code is not necessarily a binary system, but in a sense may be a ternary system, with a 10 for a "dit" or a "dot", a 1110 for a dash, and a 00 for a single unit of separation. Morse code can be represented as a binary stream by allowing each bit to represent one unit of time. Thus a "dit" or "dot" is represented as a 1 bit, while a "dah" or "dash" is represented as three consecutive 1 bits. Spaces between symbols, letters, and words are represented as one, three, or seven consecutive 0 bits. For example, "NO U" in Morse Code is "— . — — — . . —", which could be represented in binary as "1110100011101110111000000010101110". If, however, Morse code is represented as a ternary system, "NO U" would be represented as "1110, 10, 00, 1110, 1110, 1110, 00, 00, 00, 10, 10, 1110".
See also
*
List of computer character sets
This list provides an inventory of character coding standards mainly before modern standards like ISO/IEC 646 etc. Some of these standards have been deeply involved in historic events that still have consequences. One notable example of this is th ...
References
{{Reflist
Primitive types
Data types
Computing terminology
Data unit
Units of information