A candidate key, or simply a key, of a
relational database
A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970.
A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
is any set of
columns that have a unique combination of values in each row, with the additional constraint that removing any column could produce duplicate combinations of values.
A candidate key is a minimal
superkey,
i.e., a superkey that does not contain a smaller one. Therefore, a relation can have multiple candidate keys, each with a different number of attributes.
Specific candidate keys are sometimes called ''
primary key
In the relational model of databases, a primary key is a designated attribute (column) that can reliably identify and distinguish between each individual record in a table. The database creator can choose an existing unique attribute or combinati ...
s'', ''secondary keys'' or ''alternate keys''.
The columns in a candidate key are called prime attributes, and a column that does not occur in any candidate key is called a non-prime attribute.
Every relation without NULL values will have at least one candidate key: Since there cannot be duplicate rows, the set of all columns is a superkey, and if that is not minimal, some subset of that will be minimal.
There is a
functional dependency from the candidate key to all the attributes in the relation.
The superkeys of a relation are all the possible ways we can identify a row. The candidate keys are the minimal subsets of each superkey and as such, they are an important concept for the design of
database schema
The database schema is the structure of a database described in a formal language supported typically by a relational database management system (RDBMS). The term "wikt:schema, schema" refers to the organization of data as a blueprint of how the ...
.
Example
The definition of candidate keys can be illustrated with the following (abstract) example. Consider a relation variable (
relvar) ''R'' with attributes (''A'', ''B'', ''C'', ''D'') that has only the following two legal values ''r1'' and ''r2'':
Here ''r2'' differs from ''r1'' only in the A and D values of the last tuple.
For ''r1'' the following sets have the uniqueness property, i.e., there are no two distinct tuples in the instance with the same attribute values in the set:
: , , , , , , ,
For ''r2'' the uniqueness property holds for the following sets;
: , , , , , , ,
Since superkeys of a relvar are those sets of attributes that have the uniqueness property for ''all'' legal values of that relvar and because we assume that ''r1'' and ''r2'' are all the legal values that ''R'' can take, we can determine the set of superkeys of ''R'' by taking the intersection of the two lists:
: , , , , ,
Finally we need to select those sets for which there is no
proper subset
In mathematics, a set ''A'' is a subset of a set ''B'' if all elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they are unequal, then ''A'' is a proper subset ...
in the list, which are in this case:
: , ,
These are indeed the candidate keys of relvar ''R''.
We have to consider ''all'' the relations that might be assigned to a relvar to determine whether a certain set of attributes is a candidate key. For example, if we had considered only ''r1'' then we would have concluded that is a candidate key, which is incorrect. However, we ''might'' be able to conclude from such a relation that a certain set is ''not'' a candidate key, because that set does not have the uniqueness property (example for ''r1''). Note that the existence of a proper subset of a set that has the uniqueness property ''cannot'' in general be used as evidence that the superset is not a candidate key. In particular, note that in the case of an empty relation, every subset of the heading has the uniqueness property, including the empty set.
Determining candidate keys
The set of all candidate keys can be computed
e.g. from the set of
functional dependencies.
To this end we need to define the attribute closure
for an attribute set
.
The set
contains all attributes that are functionally implied by
.
It is quite simple to find a single candidate key.
We start with a set
of attributes and try to remove successively each attribute.
If after removing an attribute the attribute closure stays the same,
then this attribute is not necessary and we can remove it permanently.
We call the result
.
If
is the set of all attributes,
then
is a candidate key.
Actually we can detect every candidate key with this procedure
by simply trying every possible order of removing attributes.
However there are many more
permutation
In mathematics, a permutation of a set can mean one of two different things:
* an arrangement of its members in a sequence or linear order, or
* the act or process of changing the linear order of an ordered set.
An example of the first mean ...
s of attributes (
)
than
subsets
In mathematics, a set ''A'' is a subset of a set ''B'' if all elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they are unequal, then ''A'' is a proper subse ...
(
).
That is, many attribute orders will lead to the same candidate key.
There is a fundamental difficulty for efficient algorithms for candidate key computation:
Certain sets of functional dependencies lead to exponentially many candidate keys.
Consider the
functional dependencies
which yields
candidate keys:
.
That is, the best we can expect is an algorithm that is efficient with respect to the number of candidate keys.
The following algorithm actually runs in polynomial time in the number of candidate keys and functional dependencies:
[
]
function find_candidate_keys(A, F)
/* A is the set of all attributes and F is the set of functional dependencies */
K
:= minimize(A);
n := 1; /* Number of Keys known so far */
i := 0; /* Currently processed key */
while i < n do
for each α → β ∈ F do
/* Build a new potential key from the previous known key and the current FD */
S := α ∪ (K
− β);
/* Search whether the new potential key is part of the already known keys */
found := false;
for j := 0 to n-1 do
if K
⊆ S then found := true;
/* If not, add it */
if not found then
K
:= minimize(S);
n := n + 1;
i := i + 1
return K
The idea behind the algorithm is that given a candidate key
and a functional dependency
,
the reverse application of the functional dependency yields
the set
,
which is a key, too.
It may however be covered by other already known candidate keys.
(The algorithm checks this case using the 'found' variable.)
If not, then minimizing the new key yields a new candidate key.
The key insight is that all candidate keys can be created this way.
See also
*
Alternate key, a key that is not selected as a primary key among candidate keys for a relationship
*
Compound key
In database design, a composite key is a candidate key that consists of two or more attributes, (table columns) that together uniquely identify an entity occurrence (table row).
A compound key is a composite key for which each attribute that m ...
*
Database normalization
Database normalization is the process of structuring a relational database in accordance with a series of so-called '' normal forms'' in order to reduce data redundancy and improve data integrity. It was first proposed by British computer scien ...
*
Primary key
In the relational model of databases, a primary key is a designated attribute (column) that can reliably identify and distinguish between each individual record in a table. The database creator can choose an existing unique attribute or combinati ...
*
Relational database
A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970.
A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
*
Superkey
*
Prime implicant is the corresponding notion of a candidate key in
boolean logic
In mathematics and mathematical logic, Boolean algebra is a branch of algebra. It differs from elementary algebra in two ways. First, the values of the variable (mathematics), variables are the truth values ''true'' and ''false'', usually denot ...
References
*
External links
Relational Database Management Systems - Database Design - Terms of Reference - Keys An overview of the different types of keys in the RDBMS (Relational Database Management System).
{{Databases
Data modeling
Relational model
Database management systems