The null character (also null terminator) is a
control character with the value zero.
It is present in many
character sets, including those defined by the
Baudot and
ITA2 codes,
ISO/IEC 646 (or
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
), the
C0 control code
The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor ...
, the
Universal Coded Character Set (or
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
), and
EBCDIC. It is available in nearly all mainstream
programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming l ...
s. It is often abbreviated as NUL (or NULL, though in some contexts that term is used for the
null pointer). In 8-bit codes, it is known as a null byte.
The original meaning of this character was like
NOP—when sent to a
printer or a
terminal, it has no effect (some terminals, however, incorrectly display it as
space
Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually con ...
). When electromechanical
teleprinter
A teleprinter (teletypewriter, teletype or TTY) is an electromechanical device that can be used to send and receive typed messages through various communications channels, in both point-to-point (telecommunications), point-to-point and point- ...
s were used as computer output devices, one or more null characters were sent at the end of each printed line to allow time for the mechanism to return to the first printing position on the next line. On
punched tape, the character is represented with no holes at all, so a new unpunched tape is initially filled with null characters, and often text could be inserted at a reserved space of null characters by punching the new characters into the tape over the nulls.
Today the character has much more significance in the programming language
C and its derivatives and in many data formats, where it serves as a reserved character used to signify the end of a
string
String or strings may refer to:
*String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects
Arts, entertainment, and media Films
* ''Strings'' (1991 film), a Canadian anim ...
, often called a
null-terminated string. This allows the string to be any length with only the overhead of one byte; the alternative of storing a count requires either a string length limit of 255 or an overhead of more than one byte (there are other advantages/disadvantages described in the
null-terminated string article).
Representation
The null character is often represented as the
escape sequence \0
in
source code
In computing, source code, or simply code, is any collection of code, with or without comment (computer programming), comments, written using a human-readable programming language, usually as plain text. The source code of a Computer program, p ...
,
string literals or character constants.
[Kernighan and Ritchie, ''C'', p. 38] In many languages (
such as C, which introduced this notation), this is not a separate escape sequence, but an octal escape sequence with a single
octal
The octal numeral system, or oct for short, is the radix, base-8 number system, and uses the Numerical digit, digits 0 to 7. This is to say that 10octal represents eight and 100octal represents sixty-four. However, English, like most languages, ...
digit 0; as a consequence,
\0
must not be followed by any of the digits
0
through
7
; otherwise it is interpreted as the start of a longer octal escape sequence. Other escape sequences that are found in use in various languages are
\000
,
\x00
,
\z
, or
\u0000
. A null character can be placed in a
URL
A Uniform Resource Locator (URL), colloquially termed as a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifie ...
with the
percent code %00
.
The ability to represent a null character does not always mean the resulting string will be correctly interpreted, as many programs will consider the null to be the end of the string. Thus the ability to type it (in case of
unchecked user input Improper input validation or unchecked user input is a type of vulnerability in computer software that may be used for security exploits. This vulnerability is caused when " e product does not validate or incorrectly validates input that can affect ...
) creates a
vulnerability known as null byte injection and can lead to security exploits.
Null Byte Injection
WASC Threat Classification Null Byte Attack section.
In caret notation the null character is ^@
. On some keyboards, one can enter a null character by holding down and pressing (on US layouts just will often work, there being no need for to get the @ sign).
In documentation, the null character is sometimes represented as a single- em-width symbol containing the letters "NUL". In Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
, there is a character with a corresponding glyph for visual representation of the null character, symbol for null, U+2400 (␀)—not to be confused with the actual null character, U+0000.
The Hexadecimal notation for null is 00
and decoding the Base64 string AA
. also holds the null character.
In the Windows
Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ...
operating system the character can be typed in in some GUI applications, such as WordPad
WordPad is the basic word processor that has been included with almost all versions of Microsoft Windows from Windows 95 onwards. It is more advanced than Windows Notepad, and simpler than Microsoft Word and Microsoft Works (last updated in 2 ...
, by typing 2400 followed immediately by the key combination.
Encoding
In all modern character sets, the null character has a code point value of zero. In most encodings, this is translated to a single code unit with a zero value. For instance, in UTF-8
UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''.
UTF-8 is capable of ...
it is a single zero byte. However, in Modified UTF-8 the null character is encoded as two bytes: 0xC0, 0x80. This allows the byte with the value of zero, which is now not used for any character, to be used as a string terminator.
References
External links
Null Byte Injection
WASC Threat Classification Null Byte Attack section
* Poison Null Byte Introduction Introduction to Nullify 9
* Byte Attack
* Apple
An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple trees are cultivated worldwide and are the most widely grown species in the genus '' Malus''. The tree originated in Central Asia, where its wild ances ...
br>null byte injection
QR code
A QR code (an initialism for quick response code) is a type of Barcode#Matrix (2D) barcodes, matrix barcode (or two-dimensional barcode) invented in 1994 by the Japanese company Denso#Denso Wave, Denso Wave. A barcode is a machine-readable optic ...
br>vulnerability
{{DEFAULTSORT:Null Character
Control characters
Computer security exploits