''8-bit clean'' is an attribute of
computer system
A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations ( computation) automatically. Modern digital electronic computers can perform generic sets of operations known as programs. These prog ...
s, communication channels, and other devices and software, that handle
8-bit
In computer architecture, 8-bit integers or other data units are those that are 8 bits wide (1 octet). Also, 8-bit central processing unit (CPU) and arithmetic logic unit (ALU) architectures are those that are based on registers or data buses of ...
character encoding
Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...
s correctly. Such encoding include the
ISO 8859
ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc. There are 15 parts, excluding the abandoned ISO/IEC 8859-12 ...
series and the
UTF-8
UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''.
UTF-8 is capable of ...
encoding of
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
.
History
Until the early 1990s, many programs and data transmission channels were character-oriented and treated some characters, e.g.,
ETX, as
control characters
In computing and telecommunication, a control character or non-printing character (NPC) is a code point (a number) in a character set, that does not represent a written symbol. They are used as in-band signaling to cause effects other than the ...
. Other assumed a stream of seven-bit characters, with values between 0 and 127; for example, the
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
standard used only seven bits per character, avoiding an 8-bit representation
in order to save on data transmission costs. On computers and data links using
8-bit bytes this left the top
bit
The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented a ...
of each
byte
The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
free for use as a
parity,
flag bit, or meta data control bit. 7-bit systems and data links are unable to directly handle more complex character codes which are commonplace in non-
English-speaking countries with larger
alphabet
An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a s ...
s.
Binary file
A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document fi ...
s of
octets cannot be transmitted through 7-bit data channels directly. To work around this,
binary-to-text encoding
A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the channel does not allow binary dat ...
s have been devised which use only 7-bit
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
characters. Some of these encodings are
uuencoding uuencoding is a form of binary-to-text encoding that originated in the Unix programs uuencode and uudecode written by Mary Ann Horton at UC Berkeley in 1980, for encoding binary data for transmission in email systems.
The name "uuencoding" is deriv ...
,
Ascii85,
SREC,
BinHex
BinHex, originally short for "binary-to-hexadecimal", is a binary-to-text encoding system that was used on the classic Mac OS for sending binary files through e-mail. Originally a hexadecimal encoding, subsequent versions of BinHex are more similar ...
,
kermit and
MIME
Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Messa ...
's
Base64
In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits.
Common to all bina ...
.
EBCDIC
Extended Binary Coded Decimal Interchange Code (EBCDIC; ) is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding s ...
-based systems cannot handle all characters used in UUencoded data. However, the base64 encoding does not have this problem.
SMTP and NNTP 8-bit cleanness
Historically, various media were used to transfer messages, some of them only supporting 7-bit data, so an 8-bit message had high chances to be
garbled during transmission in the 20th century. But some implementations really did not care about formal discouraging of 8-bit data and allowed high bit set bytes to pass through. Such implementations are said to be 8-bit clean. In general, a communications protocol is said to be 8-bit clean if it correctly passes through the high bit of each byte in the communication process.
Many early
communications protocol
A communication protocol is a system of rules that allows two or more entities of a communications system to transmit information via any kind of variation of a physical quantity. The protocol defines the rules, syntax, semantics and synchro ...
standards, such as (for
SMTP
The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typic ...
), (for
NNTP
The Network News Transfer Protocol (NNTP) is an application protocol used for transporting Usenet news articles (''netnews'') between news servers, and for reading/posting articles by the end user client applications. Brian Kantor of the Univ ...
) and , were designed to work over such "7-bit" communication links. They specifically require the use of ASCII character set "transmitted as an 8-bit byte with the high-order bit cleared to zero" and some of these explicitly restrict ''all'' data to 7-bit characters.
For the first few decades of email networks (1971 to the early 1990s), most email messages were
plain text
In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects ( floating-point numbers, images, etc.). It may also include a limi ...
in the 7-bit US-ASCII character set.
The definition of SMTP, like its predecessor , limits Internet Mail to lines (1000 characters or less) of 7-bit US-ASCII characters.
Later the format of email messages was re-defined in order to support messages that are not entirely US-ASCII text (text messages in character sets other than US-ASCII, and non-text messages, such as audio and images).
[
]
specifies "NNTP operates over any reliable bi-directional 8-bit-wide data stream channel." and changes the character set for commands to UTF-8. However, still limits the character set to ASCII, including and MIME encoding of non-ASCII data.
The Internet community generally adds features by ''extension'', allowing communication in both directions between upgraded machines and not-yet-upgraded machines, rather than declaring formerly standards-compliant legacy software to be "broken" and insisting that all software worldwide be upgraded to the latest standard. In the mid-1990s, people objected to "just send 8 bits (to SMTP servers)", perhaps because of a perception that "just send 8 bits" is an implicit declaration that
ISO 8859-1
ISO/IEC 8859-1:1998, ''Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1 ...
become the new "standard encoding", forcing everyone in the world to use the same
character set
Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...
. Instead, the recommended way to take advantage of 8-bit-clean links between machines is to use the ESMTP ()
8BITMIME
The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typical ...
extension for message bodies and the SMTP
SMTPUTF8 extension for message headers. Despite this, some
mail transfer agent
The mail or post is a system for physically transporting postcards, letter (message), letters, and parcel (package), parcels. A postal service can be private or public, though many governments place restrictions on private systems. Since the mid ...
s, notably
Exim and
qmail
qmail is a mail transfer agent (MTA) that runs on Unix. It was written, starting December 1995, by Daniel J. Bernstein as a more secure replacement for the popular Sendmail program. Originally license-free software, qmail's source code ...
, relay mail to servers that do not advertise 8BITMIME without performing the conversion to 7-bit MIME (typically
quoted-printable
Quoted-Printable, or QP encoding, is a binary-to-text encoding system using printable ASCII characters (alphanumeric and the equals sign =) to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean. Hi ...
, "Q-P conversion") required by . This "just-send-8" attitude does not in fact cause problems in practice, since virtually all modern email servers are 8-bit clean.
See also
*
32-bit clean
Historically, the classic Mac OS used a form of memory management that has fallen out of favor in modern systems. Criticism of this approach was one of the key areas addressed by the change to .
The original problem for the engineers of the Maci ...
*
*
References
{{Reflist
Character encoding