NULL
is a reserved word used to identify this marker.
A null should not be confused with a value of 0. A null indicates a lack of a value, which is not the same as a zero value. For example, consider the question "How many books does Adam own?" The answer may be "zero" (we ''know'' that he owns ''none'') or "null" (we ''do not know'' how many he owns). In a database table, the column reporting this answer would start with no value (marked by null), and it would not be updated with the value zero until it is ascertained that Adam owns no books.
In SQL, null is a marker, not a value. This usage is quite different from most programming languages, where a null value of a reference means it is not pointing to any object.
History
E. F. Codd mentioned nulls as a method of representing missing data in the relational model in a 1975 paper in the ''FDT Bulletin of ACM- SIGMOD''. Codd's paper that is most commonly cited with the semantics of Null (as adopted in SQL) is his 1979 paper in the '' ACM Transactions on Database Systems'', in which he also introduced his Relational Model/Tasmania, although much of the other proposals from the latter paper have remained obscure. Section 2.3 of his 1979 paper details the semantics of Null propagation in arithmetic operations as well as comparisons employing a ternary (three-valued) logic when comparing to nulls; it also details the treatment of Nulls on other set operations (the latter issue still controversial today). In database theory circles, the original proposal of Codd (1975, 1979) is now referred to as "Codd tables". Codd later reinforced his requirement that all RDBMSs support Null to indicate missing data in a 1985 two-part article published in '' Computerworld'' magazine. The 1986 SQL standard basically adopted Codd's proposal after an implementation prototype in IBM System R. Although Don Chamberlin recognized nulls (alongside duplicate rows) as one of the most controversial features of SQL, he defended the design of Nulls in SQL invoking the pragmatic arguments that it was the least expensive form of system support for missing information, saving the programmer from many duplicative application-level checks (see semipredicate problem) while at the same time providing the database designer with the option not to use Nulls if they so desire; for example, to avoid well-known anomalies (discussed in the semantics section of this article). Chamberlin also argued that besides providing some missing-value functionality, practical experience with Nulls also led to other language features that rely on Nulls, like certain grouping constructs and outer joins. Finally, he argued that in practice Nulls also end up being used as a quick way to patch an existingChallenges
Null has been the focus of controversy and a source of debate because of its associated three-valued logic (3VL), special requirements for its use in SQL joins, and the special handling required by aggregate functions and SQL grouping operators. Computer science professor Ron van der Meyden summarized the various issues as: "The inconsistencies in the SQL standard mean that it is not possible to ascribe any intuitive logical semantics to the treatment of nulls in SQL."Ron van der Meyden,Null propagation
Arithmetic operations
Because Null is not a data value, but a marker for an absent value, using mathematical operators on Null gives an unknown result, which is represented by Null.. In the following example, multiplying 10 by Null results in Null:String concatenation
String, ,
string concatenation operator.
Comparisons with NULL and the three-valued logic (3VL)
Since Null is not a member of any data domain, it is not considered a "value", but rather a marker (or placeholder) indicating the undefined value. Because of this, comparisons with Null can never result in either True or False, but always in a third logical result, Unknown. The logical result of the expression below, which compares the value 10 to Null, is Unknown:Effect of Unknown in WHERE clauses
SQL three-valued logic is encountered in Data Manipulation Language (DML) in comparison predicates of DML statements and queries. TheWHERE
clause causes the DML statement to act on only those rows for which the predicate evaluates to True. Rows for which the predicate evaluates to either False or Unknown are not acted on by INSERT
, UPDATE
, or DELETE
DML statements, and are discarded by SELECT
queries. Interpreting Unknown and False as the same logical result is a common error encountered while dealing with Nulls. The following simple example demonstrates this fallacy:
SELECT
statement to summarily discard every row. (However, in practice, some SQL tools will retrieve rows using a comparison with Null.)
Null-specific and 3VL-specific comparison predicates
Basic SQL comparison operators always return Unknown when comparing anything with Null, so the SQL standard provides for two special Null-specific comparison predicates. TheIS NULL
and IS NOT NULL
predicates (which use a postfix syntax) test whether data is, or is not, Null.
The SQL standard contains the optional feature F571 "Truth value tests" that introduces three additional logical unary operators (six in fact, if we count their negation, which is part of their syntax), also using postfix notation. They have the following truth tables:
The F571 feature is orthogonal to the presence of the Boolean datatype in SQL (discussed later in this article) and, despite syntactic similarities, F571 does not introduce Boolean or three-valued literals in the language. The F571 feature was actually present in SQL92, well before the Boolean datatype was introduced to the standard in 1999. The F571 feature is implemented by few systems, however; PostgreSQL is one of those implementing it.
The addition of IS UNKNOWN to the other operators of SQL's three-valued logic makes the SQL three-valued logic functionally complete, meaning its logical operators can express (in combination) any conceivable three-valued logical function.
On systems that do not support the F571 feature, it is possible to emulate IS UNKNOWN ''p'' by going over every argument that could make the expression ''p'' Unknown and test those arguments with IS NULL or other NULL-specific functions, although this may be more cumbersome.
Law of the excluded fourth (in WHERE clauses)
In SQL's three-valued logic the law of the excluded middle, ''p'' OR NOT ''p'', no longer evaluates to true for all ''p''. More precisely, in SQL's three-valued logic ''p'' OR NOT ''p'' is unknown precisely when ''p'' is unknown and true otherwise. Because direct comparisons with Null result in the unknown logical value, the following queryIN
translates to an iterated version of equality over the argument set and 1<>NULL is Unknown, just as a 1=NULL is Unknown. (The CAST in this example is needed only in some SQL implementations like PostgreSQL, which would reject it with a type checking error otherwise. In many systems plain SELECT NULL works in the subquery.) The missing case above is of course:
Effect of Null and Unknown in other constructs
Joins
Joins evaluate using the same comparison rules as for WHERE clauses. Therefore, care must be taken when using nullable columns in SQL join criteria. In particular a table containing any nulls is ''not equal'' with a natural self-join of itself, meaning that whereas is true for any relation ''R'' inCOALESCE
function or CASE
expressions can be used to "simulate" Null equality in join criteria, and the IS NULL
and IS NOT NULL
predicates can be used in the join criteria as well. The following predicate tests for equality of the values A and B and treats Nulls as being equal.
CASE expressions
SQL provides two flavours of conditional expressions. One is called "simple CASE" and operates like aCASE
expressions use implicit equality comparisons which operate under the same rules as the DML WHERE
clause rules for Null. Thus, a ''simple CASE
expression'' cannot check for the existence of Null directly. A check for Null in a simple CASE
expression always results in Unknown, as in the following:
i = NULL
evaluates to Unknown no matter what value column ''i'' contains (even if it contains Null), the string 'Is Null'
will never be returned.
On the other hand, a "searched" CASE
expression can use predicates like IS NULL
and IS NOT NULL
in its conditions. The following example shows how to use a searched CASE
expression to properly check for Null:
CASE
expression, the string 'Null Result'
is returned for all rows in which ''i'' is Null.
Oracle's dialect of SQL provides a built-in function DECODE
which can be used instead of the simple CASE expressions and considers two nulls equal.
ELSE NULL
clause.
IF statements in procedural extensions
SQL/PSM (SQL Persistent Stored Modules) defines procedural extensions for SQL, such as the IF
statement. However, the major SQL vendors have historically included their own proprietary procedural extensions. Procedural extensions for looping and comparisons operate under Null comparison rules similar to those for DML statements and queries. The following code fragment, in ISO SQL standard format, demonstrates the use of Null 3VL in an IF
statement.
IF
statement performs actions only for those comparisons that evaluate to True. For statements that evaluate to False or Unknown, the IF
statement passes control to the ELSEIF
clause, and finally to the ELSE
clause. The result of the code above will always be the message 'Result is Unknown'
since the comparisons with Null always evaluate to Unknown.
Analysis of SQL Null missing-value semantics
The groundbreaking work of T. Imieliński and W. Lipski Jr. (1984) provided a framework in which to evaluate the intended semantics of various proposals to implement missing-value semantics, that is referred to as Imieliński-Lipski Algebras. This section roughly follows chapter 19 of the "Alice" textbook. A similar presentation appears in the review of Ron van der Meyden, §10.4.In selections and projections: weak representation
Constructs representing missing information, such as Codd tables, are actually intended to represent a set of relations, one for each possible instantiation of their parameters; in the case of Codd tables, this means replacement of Nulls with some concrete value. For example, A construct (such as a Codd table) is said to be a ''strong representation'' system (of missing information) if any answer to a query made on the construct can be particularized to obtain an answer for ''any'' corresponding query on the relations it represents, which are seen as models of the construct. More precisely, if is a query formula in theIf joins or unions are considered: not even weak representation
Consider the following query over the same Codd table from the previous section:Check constraints and foreign keys
The primary place in which SQL three-valued logic intersects with SQL Data Definition Language (DDL) is in the form of check constraints. A check constraint placed on a column operates under a slightly different set of rules than those for the DMLWHERE
clause. While a DML WHERE
clause must evaluate to True for a row, a check constraint must not evaluate to False. (From a logic perspective, the designated values are True and Unknown.) This means that a check constraint will succeed if the result of the check is either True or Unknown. The following example table with a check constraint will prohibit any integer values from being inserted into column ''i'', but will allow Null to be inserted since the result of the check will always evaluate to Unknown for Nulls.
CHECK (''p'' OR NOT ''p'')
always succeeds. Furthermore, assuming Nulls are to be interpreted as existing but unknown values, some pathological CHECKs like the one above allow insertion of Nulls that could never be replaced by any non-null value.
In order to constrain a column to reject Nulls, the NOT NULL
constraint can be applied, as shown in the example below. The NOT NULL
constraint is semantically equivalent to a check constraint with an IS NOT NULL
predicate.
MATCH PARTIAL
is added after the REFERENCES
declaration then any non-null must match the foreign key, e.g. would still match, but would not. Finally, if MATCH FULL
is added then would not match the constraint either, but would still match it.
Outer joins
SQL outer joins, including left outer joins, right outer joins, and full outer joins, automatically produce Nulls as placeholders for missing values in related tables. For left outer joins, for instance, Nulls are produced in place of rows missing from the table appearing on the right-hand side of theLEFT OUTER JOIN
operator. The following simple example uses two tables to demonstrate Null placeholder production in a left outer join.
The first table (Employee) contains employee ID numbers and names, while the second table (PhoneNumber) contains related employee ID numbers and phone numbers, as shown below.
The following sample SQL query performs a left outer join on these two tables.
Aggregate functions
SQL defines aggregate functions to simplify server-side aggregate calculations on data. Except for theCOUNT(*)
function, all aggregate functions perform a Null-elimination step, so that Nulls are not included in the final result of the calculation.
Note that the elimination of Null is not equivalent to replacing Null with zero. For example, in the following table, AVG(i)
(the average of the values of i
) will give a different result from that of AVG(j)
:
Here AVG(i)
is 200 (the average of 150, 200, and 250), while AVG(j)
is 150 (the average of 150, 200, 250, and 0). A well-known side effect of this is that in SQL, AVG(z)
is not equivalent with SUM(z)/COUNT(*)
but with SUM(z)/COUNT(z)
.
The output of an aggregate function can also be Null. Here is an example:
When two nulls are equal: grouping, sorting, and some set operations
Because SQL:2003 defines all Null markers as being unequal to one another, a special definition was required in order to group Nulls together when performing certain operations. SQL defines "any two values that are equal to one another, or any two Nulls", as "not distinct". This definition of ''not distinct'' allows SQL to group and sort Nulls when theGROUP BY
clause (or another SQL language feature that performs grouping) is used.
Other SQL operations, clauses, and keywords using the "not distinct" definition in their treatment of Nulls include:
* The PARTITION BY
clause of the ranking and windowing functions such as ROW_NUMBER
* The UNION
, INTERSECT
, and EXCEPT
operators, which treat NULLs as the same for row comparison/elimination purposes
* The DISTINCT
keyword used in SELECT
queries
The principle that Nulls are not equal to each other (but rather that the result is Unknown) is effectively violated in the SQL specification for the UNION
operator, which does identify nulls with each other. Consequently, some set operations in SQL, such as union and difference, may produce results not representing sure information, unlike operations involving explicit comparisons with NULL (e.g. those in a WHERE
clause discussed above). In Codd's 1979 proposal (which was adopted by SQL92) this semantic inconsistency is rationalized by arguing that removal of duplicates in set operations happens "at a lower level of detail than equality testing in the evaluation of retrieval operations."
The SQL standard does not explicitly define a default sorting order for Nulls. Instead, on conforming systems, Nulls can be sorted before or after all data values by using the NULLS FIRST
or NULLS LAST
clauses of the ORDER BY
list, respectively. Not all DBMS vendors implement this functionality, however. Vendors who do not implement this functionality may specify different treatments for Null sorting in the DBMS.
Effect on index operation
Some SQL products do not index keys containing NULLs. For instance, PostgreSQL versions prior to 8.3 did not, with the documentation for a B-tree index stating that In cases where the index enforces uniqueness, NULLs are excluded from the index and uniqueness is not enforced between NULLs. Again, quoting from the PostgreSQL documentation: This is consistent with the SQL:2003-defined behavior of scalar Null comparisons. Another method of indexing Nulls involves handling them as ''not distinct'' in accordance with the SQL:2003-defined behavior. For example, Microsoft SQL Server documentation states the following: Both of these indexing strategies are consistent with the SQL:2003-defined behavior of Nulls. Because indexing methodologies are not explicitly defined by the SQL:2003 standard, indexing strategies for Nulls are left entirely to the vendors to design and implement.Null-handling functions
SQL defines two functions to explicitly handle Nulls:NULLIF
and COALESCE
. Both functions are abbreviations for searched CASE
expressions.
NULLIF
TheNULLIF
function accepts two parameters. If the first parameter is equal to the second parameter, NULLIF
returns Null. Otherwise, the value of the first parameter is returned.
NULLIF
is an abbreviation for the following CASE
expression:
COALESCE
TheCOALESCE
function accepts a list of parameters, returning the first non-Null value from the list:
COALESCE
is defined as shorthand for the following SQL CASE
expression:
COALESCE
. Some systems (e.g. Transact-SQL) implement an ISNULL
function, or other similar functions that are functionally similar to COALESCE
. (See Is
functions for more on the IS
functions in Transact-SQL.)
NVL
The OracleNVL
function accepts two parameters. It returns the first non-NULL parameter or NULL if all parameters are NULL.
A COALESCE
expression can be converted into an equivalent NVL
expression thus:
NVL(SALARY, 0)
which says, 'if SALARY
is NULL, replace it with the value 0'.
There is, however, one notable exception. In most implementations, COALESCE
evaluates its parameters until it reaches the first non-NULL one, while NVL
evaluates all of its parameters. This is important for several reasons. A parameter ''after'' the first non-NULL parameter could be a function, which could either be computationally expensive, invalid, or could create unexpected side effects.
Data typing of Null and Unknown
TheNULL
literal is untyped in SQL, meaning that it is not designated as an integer, character, or any other specific NULL
literal to a Null of a specific type is possible using the CAST
introduced in SQL-92. For example:
IS UNKNOWN
predicate, which can be used to test whether a three-value logical outcome is Unknown, although this is merely syntactic sugar.
BOOLEAN data type
The ISO SQL:1999 standard introduced the BOOLEAN data type to SQL; however, it is still just an optional, non-core feature, coded T031. When restricted by aNOT NULL
constraint, the SQL BOOLEAN works like the Boolean type from other languages. Unrestricted, however, the BOOLEAN datatype, despite its name, can hold the truth values TRUE, FALSE, and UNKNOWN, all of which are defined as Boolean literals according to the standard. The standard also asserts that NULL and UNKNOWN "may be used interchangeably to mean exactly the same thing".
The Boolean type has been subject of criticism, particularly because of the mandated behavior of the UNKNOWN literal, which is never equal to itself because of the identification with NULL.
As discussed above, in the PostgreSQL implementation of SQL, Null is used to represent all UNKNOWN results, including the UNKNOWN BOOLEAN. PostgreSQL does not implement the UNKNOWN literal (although it does implement the IS UNKNOWN operator, which is an orthogonal feature.) Most other major vendors do not support the Boolean type (as defined in T031) as of 2012. The procedural part of Oracle's PL/SQL, however, supports BOOLEAN variables; these can also be assigned NULL and the value is considered the same as UNKNOWN.
Controversy
Common mistakes
Misunderstanding of how Null works is the cause of a great number of errors in SQL code, both in ISO standard SQL statements and in the specific SQL dialects supported by real-world database management systems. These mistakes are usually the result of confusion between Null and either 0 (zero) or an empty string (a string value with a length of zero, represented in SQL as''
). Null is defined by the SQL standard as different from both an empty string and the numerical value 0
, however. While Null indicates the absence of any value, the empty string and numerical zero both represent actual values.
A classic error is the attempt to use the equals operator =
in combination with the keyword NULL
to find rows with Nulls. According to the SQL standard this is an invalid syntax and shall lead to an error message or an exception. But most implementations accept the syntax and evaluate such expressions to UNKNOWN
. The consequence is that no rows are found—regardless of whether rows with Nulls exist or not. The proposed way to retrieve rows with Nulls is the use of the predicate IS NULL
instead of = NULL
.
WHERE
clause or conditional statement might compare a column's value with a constant. It is often incorrectly assumed that a missing value would be "less than" or "not equal to" a constant if that field contains Null, but, in fact, such expressions return Unknown. An example is below:
NULL
literal or the UNKNOWN
truth-value, SQL will always return UNKNOWN
as the result of the expression. This is a partial equivalence relation and makes SQL an example of a ''Non-Reflexive logic''..
Similarly, Nulls are often confused with empty strings. Consider the LENGTH
function, which returns the number of characters in a string. When a Null is passed into this function, the function returns Null. This can lead to unexpected results, if users are not well versed in 3-value logic. An example is below:
Criticisms
The ISO SQL implementation of Null is the subject of criticism, debate and calls for change. In ''The Relational Model for Database Management: Version 2'', Codd suggested that the SQL implementation of Null was flawed and should be replaced by two distinct Null-type markers. The markers he proposed were to stand for ''"Missing but Applicable"'' and ''"Missing but Inapplicable"'', known as ''A-values'' and ''I-values'', respectively. Codd's recommendation, if accepted, would have required the implementation of a four-valued logic in SQL. Others have suggested adding additional Null-type markers to Codd's recommendation to indicate even more reasons that a data value might be "Missing", increasing the complexity of SQL's logic system. At various times, proposals have also been put forth to implement multiple user-defined Null markers in SQL. Because of the complexity of the Null-handling and logic systems required to support multiple Null markers, none of these proposals have gained widespread acceptance. Chris Date and Hugh Darwen, authors of ''The Third Manifesto'', have suggested that the SQL Null implementation is inherently flawed and should be eliminated altogether, pointing to inconsistencies and flaws in the implementation of SQL Null-handling (particularly in aggregate functions) as proof that the entire concept of Null is flawed and should be removed from the relational model. Others, like author Fabian Pascal, have stated a belief that "how the function calculation should treat missing values is not governed by the relational model."Closed-world assumption
Another point of conflict concerning Nulls is that they violate the closed-world assumption model of relational databases by introducing an open-world assumption into it. The closed world assumption, as it pertains to databases, states that "Everything stated by the database, either explicitly or implicitly, is true; everything else is false." This view assumes that the knowledge of the world stored within a database is complete. Nulls, however, operate under the open world assumption, in which some items stored in the database are considered unknown, making the database's stored knowledge of the world incomplete.See also
* SQL * NULLs in: Wikibook SQL * Three-valued logic * Data manipulation language * Codd's 12 rules * Check constraint * Relational Model/Tasmania *References
Further reading
* E. F. Codd. Understanding relations (installment #7). FDT Bulletin of ACM-SIGMOD, 7(3-4):23–28, 1975. * Especially §2.3. * * * Claude RubinsonExternal links