First normal form (1NF) is the simplest form of
database normalization
Database normalization is the process of structuring a relational database in accordance with a series of so-called '' normal forms'' in order to reduce data redundancy and improve data integrity. It was first proposed by British computer scien ...
defined by English computer scientist
Edgar F. Codd
Edgar Frank "Ted" Codd (19 August 1923 – 18 April 2003) was a British computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases and relational database ...
, the inventor of the
relational database
A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970.
A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
. A
relation
Relation or relations may refer to:
General uses
* International relations, the study of interconnection of politics, economics, and law on a global level
* Interpersonal relationship, association or acquaintance between two or more people
* ...
(or a
''table'', in
SQL
Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel")
is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
) can be said to be in first normal form if each field is ''atomic'', containing a single value rather than a set of values or a
nested table. In other words, a relation complies with first normal form if no
attribute domain (the set of values allowed in a given column) has relations as elements.
Most relational database management systems, including standard SQL, do not support creating or using table-valued columns, which means most relational databases will be in first normal form by necessity. Otherwise, normalization to 1NF involves eliminating nested relations by breaking them up into separate relations associated with each other using
foreign key
A foreign key is a set of attributes in a table that refers to the primary key of another table, linking these two tables. In the context of relational databases, a foreign key is subject to an inclusion dependency constraint that the tuples ...
s.
This process is a necessary step when moving data from a non-relational (or
NoSQL
NoSQL (originally meaning "Not only SQL" or "non-relational") refers to a type of database design that stores and retrieves data differently from the traditional table-based structure of relational databases. Unlike relational databases, which ...
) database, such as one using a
hierarchical
A hierarchy (from Greek: , from , 'president of sacred rites') is an arrangement of items (objects, names, values, categories, etc.) that are represented as being "above", "below", or "at the same level as" one another. Hierarchy is an importan ...
or
document-oriented model, to a relational database.
A database must satisfy 1NF to satisfy further "
normal forms
Database normalization is the process of structuring a relational database in accordance with a series of so-called '' normal forms'' in order to reduce data redundancy and improve data integrity. It was first proposed by British computer sci ...
", such as
2NF and
3NF, which enable the reduction of redundancy and anomalies. Other benefits of adopting 1NF include the introduction of increased
data independence
Data independence is the type of data transparency that matters for a centralized DBMS. It refers to the immunity of user applications to changes made in the definition and organization of data. Application programs should not, ideally, be expo ...
and flexibility (including features like
many-to-many
Many-to-many communication occurs when information is shared between groups. Members of a group receive information from multiple senders.
Wikis are a type of many-to-many communication, where multiple editors collaborate to create content that is ...
relationships) and simplification of the
relational algebra
In database theory, relational algebra is a theory that uses algebraic structures for modeling data and defining queries on it with well founded semantics (computer science), semantics. The theory was introduced by Edgar F. Codd.
The main applica ...
and
query language
A query language, also known as data query language or database query language (DQL), is a computer language used to make queries in databases and information systems. In database systems, query languages rely on strict theory to retrieve informa ...
necessary to describe operations on the database.
Codd considered 1NF mandatory for relational databases, while the other normal forms were merely guidelines for database design.
Background
First normal form was introduced in 1970 by
Edgar F. Codd
Edgar Frank "Ted" Codd (19 August 1923 – 18 April 2003) was a British computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases and relational database ...
in his paper "A relational model of data for large shared data banks", although initially it was simply referred to as "normalization" or "normal form". It was renamed to "first normal form" when Codd introduced additional normal forms in his paper "Further Normalization of the Data Base Relational Model" in 1971.
The relational model was proposed as an improvement over
hierarchical
A hierarchy (from Greek: , from , 'president of sacred rites') is an arrangement of items (objects, names, values, categories, etc.) that are represented as being "above", "below", or "at the same level as" one another. Hierarchy is an importan ...
databases which were prevalent at the time. A key difference lies in how relationships between records are represented. In a hierarchical database, one-to-many relationships are represented through containment: a single record may contain sets of records (known as repeating groups) as attribute values. But Codd argued that hierarchy is not flexible and expressive enough for more complex data models. For example many-to-many relationships cannot be represented through hierarchy. Thus he suggest eliminating nested records and instead represent relationship through
foreign keys. This allows richer relationships to be expressed, since a record can now participate in multiple relationships.
A direct translation of a hierarchical database into relations would represent repeating groups as nested relations. Thus normalization is defined as eliminating nested relations and instead represent the one-to-many relationship through foreign keys.
Codd distinguishes between "atomic" and "compound" data. Atomic (or "nondecomposable") data includes basic types such as numbers and
strings – broadly speaking, it "''cannot'' be decomposed into smaller pieces by the
DBMS
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and ana ...
(excluding certain special functions)". Compound data is made up of structures such as
relations (or ''
tables'', in
SQL
Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel")
is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
) which contain several pieces of atomic data and thus "''can'' be decomposed by the DBMS".
In a relation, each attribute (or
''column'') has a set of allowed values known as its
domain (e.g., a "Price" attribute's domain may be the set of non-negative numbers with up to 2 fractional digits). Each tuple (or
''row'') in the relation contains one value per attribute, and each must be an element in that attribute's domain. Codd distinguishes attributes which have "simple domains" containing only atomic data from attributes with "nonsimple domains" containing at least some forms of compound data. Nonsimple domains introduce a degree of structural complexity which can be difficult to navigate, to query and to update – for instance, it will be time-consuming to operate across several
nested relations (that is, tables containing further tables), which can be found in some
non-relational databases.
First normal form therefore requires all attribute domains to be ''simple'' domains, such that the data in each field is atomic and no relation has relation-valued attributes. Precisely, Codd states that, in the relational model, "values in the domains on which each relation is defined are required to be atomic with respect to the DBMS."
[ Normalization to 1NF is thus a process of eliminating nonsimple domains from all relations.
]
Examples
Design that violates 1NF
This table of customers' credit card transactions does not conform to first normal form, as each customer corresponds to a repeating group of transactions. Such a design can be represented in a hierarchical database
A hierarchical database model is a data model in which the data is organized into a tree-like structure. The data are stored as records which is a collection of one or more fields. Each field contains a single value, and the collection of fields i ...
, but not in an SQL database, since SQL does not support nested tables.
The evaluation of any query relating to customers' transactions would broadly involve two stages:
# unpacking one or more customers' groups of transactions, allowing the individual transactions in a group to be examined, and
# deriving a query result from the results of the first stage.
For example, in order to find out the monetary sum of all transactions that occurred in October 2003 for all customers, the database management system
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and an ...
(DMBS) would have to first unpack the Transactions field of each customer, then sum the Amount of each transaction thus obtained where the Date of the transaction falls in October 2003.
Design that complies with 1NF
Codd described how a database like this could be made less structurally complex and more flexible by transforming it into a relational database in first normal form. To normalize the table so it complies with first normal form, attributes with nonsimple domains must be extracted to separate, stand-alone relations. Each extracted relation gains a foreign key
A foreign key is a set of attributes in a table that refers to the primary key of another table, linking these two tables. In the context of relational databases, a foreign key is subject to an inclusion dependency constraint that the tuples ...
referencing the primary key
In the relational model of databases, a primary key is a designated attribute (column) that can reliably identify and distinguish between each individual record in a table. The database creator can choose an existing unique attribute or combinati ...
of the relation which initially contained it. This process can be applied recursively to nonsimple domains nested in multiple levels (i.e., domains containing tables within tables within tables, and so on).
In this example, CustomerID is the primary key of the containing relation and will therefore be appended as a foreign key to the new relation:
In this modified design, the primary key is in the first relation and in the second relation.
Now that a single, "top-level" relation contains all transactions, it will be simpler to run queries on the database. To find the monetary sum of all October transactions, the DMBS would simply find all rows with a Date falling in October and sum the Amount fields. All values are now easily exposed to the DBMS, whereas previously some values were embedded in lower-level structures that had to be handled specially. Accordingly, the normalized design lends itself well to general-purpose query processing, whereas the unnormalized design does not.
It is worth noting that the revised design also meets the additional requirements for second
The second (symbol: s) is a unit of time derived from the division of the day first into 24 hours, then to 60 minutes, and finally to 60 seconds each (24 × 60 × 60 = 86400). The current and formal definition in the International System of U ...
and third normal form
Third normal form (3NF) is a database schema design approach for relational databases which uses normalizing principles to reduce the duplication of data, avoid data anomalies, ensure referential integrity, and simplify data management. It was d ...
.
Rationale
Normalization to 1NF is the major theoretical component of transferring a database to the relational model
The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data are represented in terms of t ...
. Use of a relational database in 1NF brings certain advantages:
* It enables data to be stored in regular two-dimensional arrays; supporting nested relations would require more complex data structures.
* It allows for the use of a simpler query language
A query language, also known as data query language or database query language (DQL), is a computer language used to make queries in databases and information systems. In database systems, query languages rely on strict theory to retrieve informa ...
, like SQL
Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel")
is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
, since any data item can be identified using only a relation name, attribute name and key; addressing nested data items would require a more complex language with support for hierarchical data paths.
* Representing relationships using foreign keys is more flexible and allows for features such as many-to-many
Many-to-many communication occurs when information is shared between groups. Members of a group receive information from multiple senders.
Wikis are a type of many-to-many communication, where multiple editors collaborate to create content that is ...
relationships, while a hierarchical model can represent only one-to-one or one-to-many relationships.
* Since locating data items is not coupled to a parent–child hierarchy, a database in 1NF creates greater data independence
Data independence is the type of data transparency that matters for a centralized DBMS. It refers to the immunity of user applications to changes made in the definition and organization of data. Application programs should not, ideally, be expo ...
and is more resilient to structural changes over time.
* From 1NF, further normalization becomes possible (for example to 2NF or 3NF), which can reduce data redundancy and anomalies.
Controversy about compound values
There is some discussion about to what extent compound or complex values other than relations (such as arrays
An array is a systematic arrangement of similar objects, usually in rows and columns.
Things called an array include:
{{TOC right
Music
* In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
or XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
data) are permitted in 1NF. Codd states that relations are the only type of compound data allowed within the relational model (if not in attribute domains), since any additional type of compound data would add complexity without adding power; nevertheless, the model specifically allows "certain special functions" like SUBSTRING
to decompose values otherwise considered atomic.
Hugh Darwen
Hugh Darwen is a computer scientist who was an employee of IBM United Kingdom from 1967. to 2004, and has been involved in the development of the relational model.
Work
From 1978 to 1982 he was a chief architect on Business System 12, a dat ...
and Christopher J. Date have suggested that Codd's concept of an "atomic value" is ambiguous, and that this ambiguity has led to widespread confusion about how 1NF should be understood. In particular, the notion of an atomic value as a "value that cannot be decomposed" is problematic, as it would seem to imply that few, if any, data types are atomic:
*A string
String or strings may refer to:
*String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects
Arts, entertainment, and media Films
* ''Strings'' (1991 film), a Canadian anim ...
would seem not to be atomic, as an RDBMS typically provides operators to decompose it into substring
In formal language theory and computer science, a substring is a contiguous sequence of characters within a string. For instance, "''the best of''" is a substring of "''It was the best of times''". In contrast, "''Itwastimes''" is a subsequenc ...
s.
*A fixed-point number would seem not to be atomic, as an RDBMS typically provides operators to decompose it into integer and fractional components.
* An ISBN
The International Standard Book Number (ISBN) is a numeric commercial book identifier that is intended to be unique. Publishers purchase or receive ISBNs from an affiliate of the International ISBN Agency.
A different ISBN is assigned to e ...
would seem not to be atomic, as it includes various parts, including the ''registration group'', ''registrant'' and ''publication'' elements.
Date suggests that "the notion of atomicity ''has no absolute meaning''": a value may be considered atomic for some purposes, but may be considered an assemblage of more basic elements for other purposes. If this position is accepted, 1NF cannot be defined with reference to atomicity. Columns containing any conceivable data type (from strings and numeric types to arrays and tables) are then acceptable in a 1NF table, although perhaps not always desirable – for example, it may be desirable to separate a CustomerName column into two columns, FirstName and Surname.
Cristopher J. Date's definition of 1NF
According to Christopher J. Date's definition, a table is in first normal form if and only if it is "isomorphic
In mathematics, an isomorphism is a structure-preserving mapping or morphism between two structures of the same type that can be reversed by an inverse mapping. Two mathematical structures are isomorphic if an isomorphism exists between the ...
to some relation", which means, specifically, that it satisfies the following five conditions:
# There is no specific top-to-bottom ordering of the rows.
# There is no specific left-to-right ordering of the columns.
# There are no duplicate rows.
# Every field (or intersection of a row and a column) contains exactly one value from the applicable domain and nothing else.
# All columns are regular (i.e., rows have no hidden components such as row IDs, object IDs, or hidden timestamps).
Violation of any of these conditions would mean that the table is not strictly relational, and therefore that it is not in first normal form.
This definition of 1NF permits relation-valued attributes (tables within tables), which Date argues are useful in rare cases. Examples of tables (or views) that would not meet this definition of first normal form are:
*A table that lacks a unique key
In relational database management systems, a unique key is a candidate key. All the candidate keys of a relation can uniquely identify the records of the relation, but only one of them is used as the primary key of the relation. The remaining candi ...
constraint. Such a table would be able to accommodate duplicate rows, in violation of condition 3.
*A view whose definition mandates that results be returned in a particular order, so that the row-ordering is an intrinsic and meaningful aspect of the view, in violation of condition 1. The tuple
In mathematics, a tuple is a finite sequence or ''ordered list'' of numbers or, more generally, mathematical objects, which are called the ''elements'' of the tuple. An -tuple is a tuple of elements, where is a non-negative integer. There is o ...
s in true relations are not ordered with respect to each other (such views cannot be created using SQL
Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel")
is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
that conforms to the SQL:2003 standard).
*A table with at least one nullable attribute. A nullable attribute would be in violation of condition 4, which requires every column to contain exactly one value from its column's domain. This aspect of condition 4 is controversial; it marks an important departure from Codd's later vision of the relational model
The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data are represented in terms of t ...
, which made explicit provision for nulls.[ (the third of Codd's 12 rules)]
See also
* Attribute–value system
*Second normal form
Second normal form (2NF), in database normalization, is a normal form. A relation is in the second normal form if it fulfills the following two requirements:
# It is in first normal form.
# It does not have any non-prime attribute that is fun ...
(2NF)
*Third normal form
Third normal form (3NF) is a database schema design approach for relational databases which uses normalizing principles to reduce the duplication of data, avoid data anomalies, ensure referential integrity, and simplify data management. It was d ...
(3NF)
*Boyce–Codd normal form
Boyce–Codd normal form (BCNF or 3.5NF) is a normal form used in database normalization. It is a slightly stricter version of the third normal form (3NF). By using BCNF, a database will remove all redundancies based on functional dependencies. ...
(BCNF or 3.5NF)
*Fourth normal form
Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce–Codd normal form (BCNF). Whereas the second, third, and Boyce–Codd normal form ...
(4NF)
*Fifth normal form
Fifth normal form (5NF), also known as projection–join normal form (PJ/NF), is a level of database normalization designed to remove redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relati ...
(5NF)
*Sixth normal form
Sixth normal form (6NF) is a normal form used in relational database normalization which extends the relational algebra and generalizes relational operators (such as join) to support interval data, which can be useful in temporal databases.
Th ...
(6NF)
References
Further reading
* Date, C. J., & Lorentzos, N., & Darwen, H. (2002).
Temporal Data & the Relational Model
' (1st ed.). Morgan Kaufmann. .
* Date, C. J. (1999),
' (8th ed.). Addison-Wesley Longman. .
* Kent, W. (1983)
', ''Communications of the ACM'', vol. 26, p. 120–125.
{{Database normalization
1NF
de:Normalisierung (Datenbank)#Erste Normalform (1NF)
pl:Postać normalna (bazy danych)