A natural key (also known as business key or domain key) is a type of
unique key in a
database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
formed of attributes that exist and are used in the external world outside the database (i.e. in the business domain or
domain of discourse
In the formal sciences, the domain of discourse or universe of discourse (borrowing from the mathematical concept of ''universe'') is the set of entities over which certain variables of interest in some formal treatment may range.
It is also ...
). In the
relational model
The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data are represented in terms of t ...
of data, a natural key is a
superkey and is therefore a
functional determinant In functional analysis, a branch of mathematics, it is sometimes possible to generalize the notion of the determinant of a square matrix of finite order (representing a linear transformation from a finite-dimensional vector space to itself) to the ...
for all attributes in a relation.
A natural key serves two complementary purposes:
* It provides a means of
unique identification for data
* It imposes a rule, specifically a
uniqueness constraint, to ensure that data remains unique within an
information system
An information system (IS) is a formal, sociotechnical, organizational system designed to collect, process, Information Processing and Management, store, and information distribution, distribute information. From a sociotechnical perspective, info ...
The uniqueness constraint assures uniqueness of data within a certain technical context (e.g. a set of values in a table, file or relation variable) by rejecting input of any data that would otherwise violate the constraint. This means that the user can rely on a guaranteed correspondence between facts identified by key values recorded in a system and the external domain of discourse (a
single version of the truth according to
Kimball).
A natural key differs from a
surrogate key
A surrogate key (or synthetic key, pseudokey, entity identifier, factless key, or technical key) in a database is a unique identifier for either an ''entity'' in the modeled world or an ''object'' in the database. The surrogate key is ''not'' deri ...
which has no meaning outside the database itself and is not based on real-world observation or intended as a statement about the reality being modelled. A natural key therefore provides a certain
data quality
Data quality refers to the state of qualitative or quantitative pieces of information. There are many definitions of data quality, but data is generally considered high quality if it is "fit for tsintended uses in operations, decision making and ...
guarantee whereas a surrogate does not. It is common for elements of data to have several keys, any number of which may be natural or surrogate.
Advantages
The advantages of using a natural key to uniquely identify records in a relation include less disk space usage, the natural key is an attribute that is related to the business or the real world so in most cases, it is already being stored in the relation which saves disk space as compared to creating a new column for storing the
surrogate key
A surrogate key (or synthetic key, pseudokey, entity identifier, factless key, or technical key) in a database is a unique identifier for either an ''entity'' in the modeled world or an ''object'' in the database. The surrogate key is ''not'' deri ...
.
Another advantage of using natural keys is that it simplifies enforcement of data quality, and they are easier to relate to real life while designing the database system. They simplify the quality of data as using a natural key that is unique in the real world ensures that there cannot be multiple records with the same
primary key
In the relational model of databases, a primary key is a designated attribute (column) that can reliably identify and distinguish between each individual record in a table. The database creator can choose an existing unique attribute or combinati ...
. Comparing the database schema to a real world scenario is a huge part of designing a database schema and when a natural key is being used in the tables of the database, it makes it easy for the database engineer to engineer the database system.
Disadvantages
Usage of natural keys as unique identifiers in a table has one main disadvantage which is the change of
business rule A business rule defines or constrains some aspect of a business. It may be expressed to specify an action to be taken when certain conditions are true or may be phrased so it can only resolve to either true or false. Business rules are intended to a ...
s or the change of rules of the attribute in the real world. The definition of the structure of the natural key attribute might change in the future.
For example if there is a table storing the information about US citizens, the
Social Security Number
In the United States, a Social Security number (SSN) is a nine-digit number issued to United States nationality law, U.S. citizens, Permanent residence (United States), permanent residents, and temporary (working) residents under section 205(c)(2 ...
would act as the natural key, Social Security Number being the natural key might pose a problem in the future if the US government changes the structure of the Social Security Number and increases the number of digits in the SSN due to some reason. In that case, the
database administrator
A database administrator (DBA) manages computer databases. The role may include capacity planning, installation, configuration, database design, migration, performance monitoring, security, troubleshooting, as well as backup and data re ...
will have to change the schema of the table and perhaps also update the records of the table. In other cases, this can prevent improvements of the system altogether due to too extensive effort required for the change, e.g., the inability of the
knowledge management
Knowledge management (KM) is the set of procedures for producing, disseminating, utilizing, and overseeing an organization's knowledge and data. It alludes to a multidisciplinary strategy that maximizes knowledge utilization to accomplish organ ...
software
Confluence
In geography, a confluence (also ''conflux'') occurs where two or more watercourses join to form a single channel (geography), channel. A confluence can occur in several configurations: at the point where a tributary joins a larger river (main ...
, to represent multiple pages with the same title.
References
{{Reflist
Data modeling