Quoted Printable
   HOME

TheInfoList



OR:

Quoted-Printable, or QP encoding, is a
binary-to-text encoding A binary-to-text encoding is code, encoding of data (computing), data in plain text. More precisely, it is an encoding of binary data in a sequence of character (computing), printable characters. These encodings are necessary for transmission of ...
system using printable ASCII characters (
alphanumeric Alphanumericals or alphanumeric characters are any collection of number characters and letters in a certain language. Sometimes such characters may be mistaken one for the other. Merriam-Webster suggests that the term "alphanumeric" may often ...
and the
equals sign The equals sign (British English) or equal sign (American English), also known as the equality sign, is the mathematical symbol , which is used to indicate equality. In an equation it is placed between two expressions that have the same valu ...
=) to transmit
8-bit In computer architecture, 8-bit integers or other data units are those that are 8 bits wide (1 octet). Also, 8-bit central processing unit (CPU) and arithmetic logic unit (ALU) architectures are those that are based on registers or data bu ...
data over a 7-bit data path or, generally, over a medium which is not
8-bit clean ''8-bit clean'' is an attribute of computer systems, communication channels, and other devices and software, that process 8-bit character encodings without treating any byte as an in-band control code. History Until the early 1990s, many progr ...
. Historically, because of the wide range of systems and protocols that could be used to transfer messages, e-mail was often assumed to be non-8-bit-clean – however, modern
SMTP The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typi ...
servers ''are'' in most cases 8-bit clean and support 8BITMIME extension. It can also be used with data that contains non-permitted
octets Octet may refer to: Music * Octet (music), ensemble consisting of eight instruments or voices, or composition written for such an ensemble ** String octet, a piece of music written for eight string instruments *** Octet (Mendelssohn), 1825 compos ...
or line lengths exceeding SMTP limits. It is defined as a MIME content transfer encoding for use in
e-mail Electronic mail (usually shortened to email; alternatively hyphenated e-mail) is a method of transmitting and receiving Digital media, digital messages using electronics, electronic devices over a computer network. It was conceived in the ...
. QP works by using the equals sign = as an
escape character In computing and telecommunications, an escape character is a character that invokes an alternative interpretation on the following characters in a character sequence. An escape character is a particular case of metacharacters. Generally, the ...
. It also limits line length to 76, as some software has limits on line length.


Introduction

MIME A mime artist, or simply mime (from Greek language, Greek , , "imitator, actor"), is a person who uses ''mime'' (also called ''pantomime'' outside of Britain), the acting out of a story through body motions without the use of speech, as a the ...
defines mechanisms for sending other kinds of information in e-mail, including text in languages other than English, using
character encoding Character encoding is the process of assigning numbers to graphical character (computing), characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical v ...
s other than ASCII. However, these encodings often use byte values outside the ASCII range so they need to be encoded further before they are suitable for use in a non-8-bit-clean environment. Quoted-Printable encoding is one method used for mapping arbitrary bytes into sequences of ASCII characters. So, Quoted-Printable is not a character encoding scheme itself, but a data coding layer to be used under some byte-oriented character encoding. QP encoding is reversible, meaning the original bytes and hence the non-ASCII characters they represent can be identically recovered. Quoted-Printable and
Base64 In computer programming, Base64 is a group of binary-to-text encoding schemes that transforms binary data into a sequence of printable characters, limited to a set of 64 unique characters. More specifically, the source binary data is taken 6 bits ...
are the two MIME content transfer encodings, if the trivial "7bit" and "8bit" encoding are not counted. If the text to be encoded does not contain many non-ASCII characters, then Quoted-Printable results in a fairly readable and compact encoded result. On the other hand, if the input has many 8-bit characters, then Quoted-Printable becomes both unreadable and extremely inefficient. Base64 is not human-readable, but has a uniform overhead for all data and is the more sensible choice for binary formats or text in a
script Script may refer to: Writing systems * Script, a distinctive writing system, based on a repertoire of specific elements or symbols, or that repertoire * Script (styles of handwriting) ** Script typeface, a typeface with characteristics of handw ...
other than the
Latin script The Latin script, also known as the Roman script, is a writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae in Magna Graecia. The Gree ...
.


Quoted-printable encoding

Any 8-bit byte value may be encoded with 3 characters: an = followed by two
hexadecimal Hexadecimal (also known as base-16 or simply hex) is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbo ...
digits (0–9 or A–F) representing the byte's numeric value. For example, an ASCII
form feed A page break is a marker in an electronic document that tells the document interpreter the content which follows is part of a new page. A page break causes a form feed to be sent to the printer during spooling of the document to the printer. It i ...
character (decimal value 12) can be represented by =0C, and an ASCII equal sign (decimal value 61) must be represented by =3D. All characters except printable ASCII characters or end of line characters (but also =) must be encoded in this fashion. All printable ASCII characters (decimal values between 33 and 126) may be represented by themselves, except = (decimal 61, hexadecimal 3D, therefore =3D). ASCII tab and
space character A whitespace character is a character data element that represents white space when text is rendered for display by a computer. For example, a ''space'' character (, ASCII 32) represents blank space such as a word divider in a Western scri ...
s, decimal values 9 and 32, may be represented by themselves, except if these characters would appear at the end of the encoded line. In that case, they would need to be escaped as =09 (tab) or =20 (space), or be followed by a = (soft line break) as the last character of the encoded line. This last solution is valid because it prevents the tab or space from being the last character of the encoded line. If the data being encoded contains meaningful line breaks, they must be encoded as an ASCII CR LF sequence, not as their original byte values, neither directly nor via = signs. Conversely, if byte values 13 and 10 have meanings other than end of line (in media types,Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies. November 1996
RFC 2045 # 6.7 Quoted-Printable Content-Transfer-Encoding
part "(4) (Line Breaks)". Retrieved March 18, 2013.
for example), then they must be encoded as =0D and =0A respectively. Lines of Quoted-Printable encoded data must not be longer than 76 characters. To satisfy this requirement without altering the encoded text, ''soft line breaks'' may be added as desired. A soft line break consists of an = at the end of an encoded line, and does not appear as a line break in the decoded text. These soft line breaks also allow encoding text without line breaks (or containing very long lines) for an environment where line size is limited, such as the 1000 characters per line limit of some
SMTP The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typi ...
software, as allowed by RFC 2821. A slightly modified version of Quoted-Printable is used in message headers; see MIME#Encoded-Word.


Example

The following example is a French text (encoded in UTF-8), with a high frequency of letters with diacritical marks (such as the ''é''). J'interdis aux marchands de vanter trop leurs marchandises. Car ils se font= vite p=C3=A9dagogues et t'enseignent comme but ce qui n'est par essence qu= 'un moyen, et te trompant ainsi sur la route =C3=A0 suivre les voil=C3=A0 b= ient=C3=B4t qui te d=C3=A9gradent, car si leur musique est vulgaire ils te = fabriquent pour te la vendre une =C3=A2me vulgaire. =E2=80=94=E2=80=89Antoine de Saint-Exup=C3=A9ry, Citadelle (1948) This encodes the following quotation:


See also

*
Percent-encoding URL encoding, officially known as percent-encoding, is a method to binary-to-text encoding, encode arbitrary data in a uniform resource identifier (URI) using only the ASCII, US-ASCII characters legal within a URI. Although it is known as ''URL en ...
(data encoding in URLs, mostly used for text) *
Numeric character reference A numeric character reference (NCR) is a common markup construct used in SGML and SGML-derived markup languages such as HTML and XML. It consists of a short sequence of characters that, in turn, represents a single character. Since WebSgml, XM ...
(text encoding in SGML, HTML, XML) * Rich Text Format#Character encoding (a component of text encoding)


Notes


External links

* (obsolete) * {{IETF RFC, 2045 (MIME) *
RFC 2045 — 6.7. Quoted-Printable Content-Transfer-Encoding
Binary-to-text encoding formats Email