The C0 and C1 control code or
control character sets define control codes for use in text by computer systems that use
ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.
C0 codes are the range 00
HEX–1F
HEX and the default C0 set was originally defined in
ISO 646
ISO/IEC 646 ''Information technology — ISO 7-bit coded character set for information interchange'', is an International Organization for Standardization, ISO/International Electrotechnical Commission, IEC standard in the ...
(
ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
). C1 codes are the range 80
HEX–9F
HEX and the default C1 set was originally defined in
ECMA-48
ANSI escape sequences are a standard for in-band signaling to control cursor location, color, font styling, and other options on video text terminals and terminal emulators. Certain sequences of bytes, most starting with an ASCII escape cha ...
(harmonized later with ISO 6429). The
ISO/IEC 2022 system of specifying control and graphic characters allows other C0 and C1 sets to be available for specialized applications, but they are rarely used.
C0 controls
ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
defines 32 control characters, plus the DEL character. This large number of codes was desirable at the time, as multi-byte controls would require implementation of a state machine in the terminal, which was very difficult with contemporary electronics and mechanical terminals.
Only a few codes have maintained their use: BEL, ESC, and the ''format effector'' (FE
n) characters BS, TAB, LF, VT, FF, and CR. Others are unused or have acquired different meanings such as NUL being the
C string terminator. Some data transfer protocols such as
ANPA-1312,
Kermit, and
XMODEM do make extensive use of SOH, STX, ETX, EOT, ACK, NAK and SYN for purposes approximating their original definitions; and some file formats use the "Information Separators" (IS
n) such as the
Unix info format and
Python's string method.
The names of some codes were changed in ISO 6429:1992 (or ECMA-48:1991) to be neutral with respect to writing direction. The abbreviations used were not changed, as the standard had already specified that those would remain unchanged when the standard is translated to other languages. In this table both new and old names are shown for the renamed controls (the old name is the one matching the abbreviation).
Unicode provides
Control Pictures that can replace C0 control characters to make them visible on screen. However
caret notation
Caret notation is a notation for control characters in ASCII. The notation assigns to control-code 1, sequentially through the alphabet to assigned to control-code 26 (0x1A). For the control-codes outside of the range 1–26, the ...
is used more often.
C1 controls
In 1973,
ECMA-35 and
ISO 2022 attempted to define a method so an 8-bit "extended ASCII" code could be converted to a corresponding 7-bit code, and ''vice versa''.
In a 7-bit environment, the Shift Out () would change the meaning of the 96 bytes through
(i.e. all but the C0 control codes), to be the characters that an 8-bit environment would print if it used the same code with the high bit set. This meant that the range through could not be printed in a 7-bit environment,
thus it was decided that no alternative character set could use them, and that these codes should be additional control codes, which become known as the C1 control codes. To allow a 7-bit environment to use these new controls, the sequences
ESC @
through
ESC _
were to be considered equivalent.
The later
ISO 8859
ISO/IEC 8859 is a joint International Organization for Standardization, ISO and International Electrotechnical Commission, IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC ...
standards abandoned support for 7-bit codes, but preserved this range of control characters.
The first C1 control code set to be registered for use with ISO 2022 was
DIN 31626,
a specialised set for bibliographic use which was registered in 1979.
The more common general-use
ISO/IEC 6429 set was registered in 1983,
although the ECMA-48 specification upon which it was based had been first published in 1976 and
JIS X 0211 (formerly JIS C 6323). Symbolic names defined by and early drafts of ISO 10646, but not in ISO/IEC 6429 (, and ) are also used.
Except for and in
EUC-JP text, and in text transcoded from
EBCDIC, the 8-bit forms of these codes were almost never used. , and are used to control
text terminals and
terminal emulator
A terminal emulator, or terminal application, is a computer program that emulates a video terminal within some other display architecture. Though typically synonymous with a shell or text terminal, the term ''terminal'' covers all remote term ...
s, but almost always by using their 7-bit escape code representations. Nowadays if these codes are encountered it is far more likely they are intended to be printing characters from that position of
Windows-1252 or
Mac OS Roman.
Except for , Unicode does not provide a "control picture" for any of these. There is no well-known variation of Caret notation for them either.
Other control code sets
The
ISO/IEC 2022 (ECMA-35) extension mechanism allowed escape sequences to change the C0 and C1 sets. The standard C0 control character set shown above is chosen with the sequence and the above C1 set chosen with the sequence .
Several official and unofficial alternatives have been defined, but this is pretty much obsolete. Most were forced to retain a good deal of compatibility with the ASCII controls for interoperability. The standard makes ESC,
SP and DEL "fixed" coded characters, which are available in their ASCII locations in all encodings that conform to the standard. It also specifies that if a C0 set included transmission control (TC
n) codes, they must be encoded at their ASCII locations
and could not be put in a C1 set,
and any new transmission controls must be in a C1 set.
Alternative C0 character sets
*
ANPA-1312, a text markup language used for news transmission, replaces several C0 control characters.
*
IPTC 7901, the newer international version of the above, has its own variations.
*
Videotex has a completely different set.
*
Teletext also defines a set similar to Videotex.
*
T.61/
T.51,
and others replaced EM and GS with SS2 and SS3 so these functions could be used in a 7-bit environment without resorting to
escape sequences.
* Some sets replaced FS with SS2, (same as ANPA-1312).
* The now-withdrawn JIS C 6225, designated JIS X 0207 in later sources.
replaced FS with CEX or "Control Extension" which introduces control sequences for vertical text behaviour, superscripts and subscripts and for transmitting
custom character graphics.
Alternative C1 character sets
* A specialized C1 control code set is registered for bibliographic use (including string collation), such as by
MARC-8.
* Various specialised C1 control code sets are registered for use by
Videotex formats.
* The
Stratus VOS operating system uses a C1 set called the ''NLS control set''. It includes SS1 (Single-Shift 1) through SS15 (Single-Shift 15) controls, used to invoke individual characters from pre-defined supplementary character sets,
in a similar manner to the
single-shift mechanism of ISO/IEC 2022. The only single-shift controls defined by ISO/IEC 2022 are SS2 and SS3; these are retained in the VOS set at their original code points and function the same way.
*
EBCDIC defines up to 29 additional control codes besides those present in ASCII. When translating EBCDIC to
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
(or to
ISO 8859
ISO/IEC 8859 is a joint International Organization for Standardization, ISO and International Electrotechnical Commission, IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC ...
), these codes are mapped to C1 control characters in a manner specified by IBM's Character Data Representation Architecture (CDRA).
Although the New Line (NL) does translate to the ISO/IEC 6429 (although it is often swapped with LF, following UNIX line ending convention),
the remainder of the control codes do not correspond. For example, the EBCDIC control and the ECMA-48 control are both used to begin a superscript or end a subscript, but are not mapped to one another. Extended-ASCII-mapped EBCDIC can therefore be regarded as having its own C1 set, although it is not registered with the
ISO-IR registry for ISO/IEC 2022.
Unicode
Unicode reserves the 65 code points described above for compatibility with the C0 and C1 control codes, giving them the
general category (control). These are:
* (C0 controls) and (DEL) assigned to the
C0 Controls and Basic Latin block, and
* (C1 controls) assigned to the
C1 Controls and Latin-1 Supplement block.
Unicode only specifies semantics for the C0 format controls HT, LF, VT, FF, and CR (note BS is missing); the C0 information separators FS, GS, RS, US (and SP); and the C1 control NEL.
The rest of the codes are transparent to Unicode and their meanings are left to higher-level protocols, with
ISO/IEC 6429 suggested as a default.
Unicode includes many additional format effector characters besides these, such as
marks, embeds, isolates and pops for explicit bidirectional formatting, and the
zero-width joiner and
non-joiner for controlling ligature use. However these are given the general category (format) rather than .
See also
*
Control Pictures - Unicode graphical representation characters for the C0 control codes
*
ANSI escape code
Footnotes
References
External links
* The Unicode Standard
*
C0 Controls and Basic Latin*
C1 Controls and Latin-1 Supplement*
Control Pictures** The Unicode Standard, Version 6.1.0
Chapter 16: Special Areas and Format CharactersATIS Telecom Glossary 2007''De litteris regentibus C1 quaestiones septem'' or ''Are C1 characters legal in XHTML 1.0?''W3C I18N FAQ: HTML, XHTML, XML and Control CodesInternational register of coded character sets to be used with escape sequences
{{character encoding
Control characters
de:Steuerzeichen