The null character is a
control character
In computing and telecommunications, a control character or non-printing character (NPC) is a code point in a character encoding, character set that does not represent a written Character (computing), character or symbol. They are used as in-ba ...
with the value
zero
0 (zero) is a number representing an empty quantity. Adding (or subtracting) 0 to any number leaves that number unchanged; in mathematical terminology, 0 is the additive identity of the integers, rational numbers, real numbers, and compl ...
. Many
character set
Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical values that make up a c ...
s include a
code point
A code point, codepoint or code position is a particular position in a Table (database), table, where the position has been assigned a meaning. The table may be one dimensional (a column), two dimensional (like cells in a spreadsheet), three dime ...
for a null character including
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
(
Universal Coded Character Set
The Universal Coded Character Set (UCS, Unicode) is a standard set of character (computing), characters defined by the international standard International Organization for Standardization, ISO/International Electrotechnical Commission, IEC  ...
),
ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
(
ISO/IEC 646
ISO/IEC 646 ''Information technology — ISO 7-bit coded character set for information interchange'', is an International Organization for Standardization, ISO/International Electrotechnical Commission, IEC standard in the ...
),
Baudot,
ITA2
The Baudot code () is an early character encoding for telegraphy invented by Émile Baudot in the 1870s. It was the predecessor to the International Telegraph Alphabet No. 2 (ITA2), the most common teleprinter code in use before ASCII. Each Chara ...
codes, the
C0 control code, and
EBCDIC
Extended Binary Coded Decimal Interchange Code (EBCDIC; ) is an eight- bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding si ...
. In modern character sets, the null character has a code point value of zero which is generally translated to a single code unit with a zero value. For instance, in
UTF-8
UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8.
UTF-8 supports all 1,112,0 ...
, it is a single, zero byte. However, in
Modified UTF-8 the null character is encoded as two bytes : . This allows the byte with the value of zero, which is not used for any character, to be used as a string terminator.
Originally, its meaning was like
NOP when sent to a
printer or a
terminal, it had no effect (although some terminals incorrectly displayed it as
space
Space is a three-dimensional continuum containing positions and directions. In classical physics, physical space is often conceived in three linear dimensions. Modern physicists usually consider it, with time, to be part of a boundless ...
). When electromechanical
teleprinter
A teleprinter (teletypewriter, teletype or TTY) is an electromechanical device that can be used to send and receive typed messages through various communications channels, in both point-to-point (telecommunications), point-to-point and point- ...
s were used as computer output devices, one or more null characters were sent at the end of each printed line to allow time for the mechanism to return to the first printing position on the next line. On
punched tape
file:PaperTapes-5and8Hole.jpg, Five- and eight-hole wide punched paper tape
file:Harwell-dekatron-witch-10.jpg, Paper tape reader on the Harwell computer with a small piece of five-hole tape connected in a circle – creating a physical program ...
, the character is represented with no holes at all, so a new unpunched tape is initially filled with null characters, and often text could be inserted at a reserved space of null characters by punching the new characters into the tape over the nulls.
A
null-terminated string is a commonly used data structure in the
C programming language, its many derivative languages and other programming contexts that uses a null character to indicate the end of a
string. This design allows a string to be any length at the cost of only one extra character of memory. The common competing design for a string stores the length of the string as an
integer
An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
data type
In computer science and computer programming, a data type (or simply type) is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these ...
, but this limits the size of the string to the range of the integer (for example, 255 for a byte).
For
byte
The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...
storage, the null character can be called a null byte.
Representation
Since the null character is not a
printable character representing it requires special notation in
source code
In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer.
Since a computer, at base, only ...
.
In a
string literal
string literal or anonymous string is a literal for a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally "bracketed delimiters", as in x = "foo", where , "foo ...
, the null character is often represented as the
escape sequence
In computer science, an escape sequence is a combination of characters that has a meaning other than the literal characters contained therein; it is marked by one or more preceding (and possibly terminating) characters.
Examples
* In C and ma ...
\0
(for example,
"abc\0def"
). Similar notation is often used for a character literal (i.e.
'\0'
) although that is often equivalent to the numeric literal for zero (
0
).
[Kernighan and Ritchie, ''C'', p. 38: "The character constant '\0' represents the character with value zero, the null character. '\0' is often written instead of 0 to emphasize the character nature of some expression, but the numeric value is just 0."] In many languages (
such as C, which introduced this notation), this is not a separate escape sequence, but an octal escape sequence with a single
octal
Octal (base 8) is a numeral system with eight as the base.
In the decimal system, each place is a power of ten. For example:
: \mathbf_ = \mathbf \times 10^1 + \mathbf \times 10^0
In the octal system, each place is a power of eight. For ex ...
digit 0; as a consequence,
\0
must not be followed by any of the digits
0
through
7
; otherwise it is interpreted as the start of a longer octal escape sequence. Other escape sequences that are found in use in various languages are
\000
,
\x00
,
\z
, or
\u0000
.
A null character can be placed in a
URL with the
percent code %00
.
The ability to represent a null character does not always mean the resulting string will be correctly interpreted, as many programs will consider the null to be the end of the string. Thus, the ability to type it (in case of
unchecked user input) creates a
vulnerability
Vulnerability refers to "the quality or state of being exposed to the possibility of being attacked or harmed, either physically or emotionally." The understanding of social and environmental vulnerability, as a methodological approach, involves ...
known as null byte injection and can lead to security exploits.
Null Byte Injection
WASC Threat Classification Null Byte Attack section.
In software documentation, the null character is often represented with the text NUL (or NULL although that may mean the null pointer
In computing, a null pointer (sometimes shortened to nullptr or null) or null reference is a value saved for indicating that the Pointer (computer programming), pointer or reference (computer science), reference does not refer to a valid Object (c ...
). In Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
, there is a character for this: .
In caret notation
Caret notation is a notation for control characters in ASCII. The notation assigns to control-code 1, sequentially through the alphabet to assigned to control-code 26 (0x1A). For the control-codes outside of the range 1–26, the ...
the null character is ^@
. On some keyboards, one can enter a null character by holding down and pressing (on US layouts just will often work, there being no need for to get the @ sign).
References
External links
Null Byte Injection
WASC Threat Classification Null Byte Attack section
Poison Null Byte Introduction
Introduction to Null Byte Attack
* Apple
An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...
br>null byte injection
QR codebr>vulnerability
{{DEFAULTSORT:Null Character
Control characters
Computer security exploits