Functional dependency
   HOME

TheInfoList



OR:

In
relational database A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relati ...
theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, a functional dependency is a constraint between two attributes in a relation. Given a relation ''R'' and sets of attributes X,Y \subseteq R, ''X'' is said to functionally determine ''Y'' (written ''X'' → ''Y'') if and only if each ''X'' value in ''R'' is associated with precisely one ''Y'' value in ''R''; ''R'' is then said to ''satisfy'' the functional dependency ''X'' → ''Y''. Equivalently, the projection \Pi_R is a
function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-oriente ...
, i.e. ''Y'' is a function of ''X''. In simple words, if the values for the ''X'' attributes are known (say they are ''x''), then the values for the ''Y'' attributes corresponding to ''x'' can be determined by looking them up in ''any''
tuple In mathematics, a tuple is a finite ordered list (sequence) of elements. An -tuple is a sequence (or ordered list) of elements, where is a non-negative integer. There is only one 0-tuple, referred to as ''the empty tuple''. An -tuple is defi ...
of ''R'' containing ''x''. Customarily ''X'' is called the ''determinant'' set and ''Y'' the ''dependent'' set. A functional dependency FD: ''X'' → ''Y'' is called ''trivial'' if ''Y'' is a
subset In mathematics, set ''A'' is a subset of a set ''B'' if all elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they are unequal, then ''A'' is a proper subset of ...
of ''X''. In other words, a dependency FD: ''X'' → ''Y'' means that the values of ''Y'' are determined by the values of ''X''. Two tuples sharing the same values of ''X'' will necessarily have the same values of ''Y''. The determination of functional dependencies is an important part of designing databases in the
relational model The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data is represented in terms of t ...
, and in
database normalization Database normalization or database normalisation (see spelling differences) is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity ...
and denormalization. A simple application of functional dependencies is ''Heath's theorem''; it says that a relation ''R'' over an attribute set ''U'' and satisfying a functional dependency ''X'' → ''Y'' can be safely split in two relations having the lossless-join decomposition property, namely into \Pi_(R)\bowtie\Pi_(R) = R where ''Z'' = ''U'' − ''XY'' are the rest of the attributes. ( Unions of attribute sets are customarily denoted by mere juxtapositions in database theory.) An important notion in this context is a
candidate key A candidate key, or simply a key, of a relational database is a minimal superkey. In other words, it is any set of columns that have a unique combination of values in each row (which makes it a superkey), with the additional constraint that removin ...
, defined as a minimal set of attributes that functionally determine all of the attributes in a relation. The functional dependencies, along with the
attribute domain Attribute may refer to: * Attribute (philosophy), an extrinsic property of an object * Attribute (research), a characteristic of an object * Grammatical modifier, in natural languages * Attribute (computing), a specification that defines a prope ...
s, are selected so as to generate constraints that would exclude as much data inappropriate to the user domain from the system as possible. A notion of
logical implication Logical consequence (also entailment) is a fundamental concept in logic, which describes the relationship between statements that hold true when one statement logically ''follows from'' one or more statements. A valid logical argument is one ...
is defined for functional dependencies in the following way: a set of functional dependencies \Sigma logically implies another set of dependencies \Gamma, if any relation ''R'' satisfying all dependencies from \Sigma also satisfies all dependencies from \Gamma; this is usually written \Sigma \models \Gamma. The notion of logical implication for functional dependencies admits a
sound In physics, sound is a vibration that propagates as an acoustic wave, through a transmission medium such as a gas, liquid or solid. In human physiology and psychology, sound is the ''reception'' of such waves and their ''perception'' by ...
and
complete Complete may refer to: Logic * Completeness (logic) * Completeness of a theory, the property of a theory that every formula in the theory's language or its negation is provable Mathematics * The completeness of the real numbers, which implies t ...
finite
axiomatization In mathematics and logic, an axiomatic system is any set of axioms from which some or all axioms can be used in conjunction to logically derive theorems. A theory is a consistent, relatively-self-contained body of knowledge which usually contains ...
, known as ''Armstrong's axioms''.


Examples


Cars

Suppose one is designing a system to track vehicles and the capacity of their engines. Each vehicle has a unique
vehicle identification number A vehicle identification number (VIN) (also called a chassis number or frame number) is a unique code, including a serial number, used by the automotive industry to identify individual motor vehicles, towed vehicles, motorcycles, scooters ...
(VIN). One would write ''VIN'' → ''EngineCapacity'' because it would be inappropriate for a vehicle's engine to have more than one capacity. (Assuming, in this case, that vehicles only have one engine.) On the other hand, ''EngineCapacity'' → ''VIN'' is incorrect because there could be many vehicles with the same engine capacity. This functional dependency may suggest that the attribute EngineCapacity be placed in a relation with
candidate key A candidate key, or simply a key, of a relational database is a minimal superkey. In other words, it is any set of columns that have a unique combination of values in each row (which makes it a superkey), with the additional constraint that removin ...
VIN. However, that may not always be appropriate. For example, if that functional dependency occurs as a result of the transitive functional dependencies VIN → VehicleModel and VehicleModel → EngineCapacity then that would not result in a normalized relation.


Lectures

This example illustrates the concept of functional dependency. The situation modelled is that of college students visiting one or more lectures in each of which they are assigned a teaching assistant (TA). Let's further assume that every student is in some semester and is identified by a unique integer ID. We notice that whenever two rows in this table feature the same StudentID, they also necessarily have the same Semester values. This basic fact can be expressed by a functional dependency: * StudentID → Semester. Note that if a row was added where the student had a different value of semester, then the functional dependency FD would no longer exist. This means that the FD is implied by the data as it is possible to have values that would invalidate the FD. Other nontrivial functional dependencies can be identified, for example: * → TA * → The latter expresses the fact that the set is a
superkey In the relational data model a superkey is a set of attributes that uniquely identifies each tuple of a relation. Because superkey values are unique, tuples with the same superkey value must also have the same non-key attribute values. That is, ...
of the relation.


Employee department model

A classic example of functional dependency is the employee department model. This case represents an example where multiple functional dependencies are embedded in a single representation of data. Note that because an employee can only be a member of one department, the unique ID of that employee determines the department. * Employee ID → Employee Name * Employee ID → Department ID In addition to this relationship, the table also has a functional dependency through a non-key attribute * Department ID → Department Name This example demonstrates that even though there exists a FD Employee ID → Department ID - the employee ID would not be a logical key for determination of the department Name. The process of normalization of the data would recognize all FDs and allow the designer to construct tables and relationships that are more logical based on the data.


Properties and axiomatization of functional dependencies

Given that ''X'', ''Y'', and ''Z'' are sets of attributes in a relation ''R'', one can derive several properties of functional dependencies. Among the most important are the following, usually called
Armstrong's axioms Armstrong's axioms are a set of references (or, more precisely, inference rules) used to infer all the functional dependencies on a relational database. They were developed by William W. Armstrong in his 1974 paper. The axioms are sound in gene ...
: * Reflexivity: If ''Y'' is a subset of ''X'', then ''X'' → ''Y'' * Augmentation: If ''X'' → ''Y'', then ''XZ'' → ''YZ'' * Transitivity: If ''X'' → ''Y'' and ''Y'' → ''Z'', then ''X'' → ''Z'' "Reflexivity" can be weakened to just X \rightarrow \varnothing, i.e. it is an actual
axiom An axiom, postulate, or assumption is a statement that is taken to be true, to serve as a premise or starting point for further reasoning and arguments. The word comes from the Ancient Greek word (), meaning 'that which is thought worthy or ...
, where the other two are proper
inference rules In the philosophy of logic, a rule of inference, inference rule or transformation rule is a logical form consisting of a function which takes premises, analyzes their syntax, and returns a conclusion (or conclusions). For example, the rule of ...
, more precisely giving rise to the following rules of syntactic consequence:M. Y. Vardi
Fundamentals of dependency theory
In E. Borger, editor, Trends in Theoretical Computer Science, pages 171–224. Computer Science Press, Rockville, MD, 1987.
\vdash X \rightarrow \varnothing
X \rightarrow Y \vdash XZ \rightarrow YZ
X \rightarrow Y, Y \rightarrow Z \vdash X \rightarrow Z. These three rules are a
sound In physics, sound is a vibration that propagates as an acoustic wave, through a transmission medium such as a gas, liquid or solid. In human physiology and psychology, sound is the ''reception'' of such waves and their ''perception'' by ...
and
complete Complete may refer to: Logic * Completeness (logic) * Completeness of a theory, the property of a theory that every formula in the theory's language or its negation is provable Mathematics * The completeness of the real numbers, which implies t ...
axiomatization of functional dependencies. This axiomatization is sometimes described as finite because the number of inference rules is finite, with the caveat that the axiom and rules of inference are all schemata, meaning that the ''X'', ''Y'' and ''Z'' range over all ground terms (attribute sets). By applying augmentation and transitivity, one can derive two additional rules: * Pseudotransitivity: If ''X'' → ''Y'' and ''YW'' → ''Z'', then ''XW'' → ''Z'' * Composition: If ''X'' → ''Y'' and ''Z'' → ''W'', then ''XZ'' → ''YW'' One can also derive the union and decomposition rules from Armstrong's axioms: This is sometimes called the splitting/combining rule. :''X'' → ''Y'' and ''X'' → ''Z''
if and only if In logic and related fields such as mathematics and philosophy, "if and only if" (shortened as "iff") is a biconditional logical connective between statements, where either both statements are true or both are false. The connective is bic ...
''X'' → ''YZ''


Closure of functional dependency

The closure is essentially the full set of values that can be determined from a set of known values for a given relationship using its functional dependencies. One uses
Armstrong's axioms Armstrong's axioms are a set of references (or, more precisely, inference rules) used to infer all the functional dependencies on a relational database. They were developed by William W. Armstrong in his 1974 paper. The axioms are sound in gene ...
to provide a proof - i.e. reflexivity, augmentation, transitivity. Given R and F a set of FDs that holds in R: The closure of F in R (denoted F+) is the set of all FDs that are logically implied by F.


Closure of a set of attributes

Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by X using F+.


Example

Imagine the following list of FDs. We are going to calculate a closure for A from this relationship. 1. ''A'' → ''B''
2. ''B'' → ''C''
3. ''AB'' → ''D'' The closure would be as follows: a) A → A (by Armstrong's reflexivity)
b) A → AB (by 1. and (a))
c) A → ABD (by (b), 3, and Armstrong's transitivity)
d) A → ABCD (by (c), and 2) The closure is therefore A → ABCD. By calculating the closure of A, we have validated that A is also a good candidate key as its closure is every single data value in the relationship.


Covers and equivalence


Covers

Definition: F covers G if every FD in G can be inferred from F. F covers G if G+F+
Every set of functional dependencies has a
canonical cover A canonical cover F_c for F (a set of functional dependencies on a relation scheme) is a set of dependencies such that F logically implies all dependencies in F_c, and F_c logically implies all dependencies in F. The set F_c has two important pr ...
.


Equivalence of two sets of FDs

Two sets of FDs F and G over schema R are equivalent, written FG, if F+ = G+. If FG, then F is a cover for G and vice versa. In other words, equivalent sets of functional dependencies are called ''covers'' of each other.


Non-redundant covers

A set F of FDs is nonredundant if there is no proper subset F' of F with F'F. If such an F' exists, F is redundant. F is a nonredundant cover for G if F is a cover for G and F is nonredundant.
An alternative characterization of nonredundancy is that F is nonredundant if there is no FD ''X'' → ''Y'' in F such that F - \models ''X'' → ''Y''. Call an FD ''X'' → ''Y'' in F redundant in F if F - \models ''X'' → ''Y''.


Applications to normalization


Heath's theorem

An important property (yielding an immediate application) of functional dependencies is that if ''R'' is a relation with columns named from some set of attributes ''U'' and ''R'' satisfies some functional dependency ''X'' → ''Y'' then R=\Pi_(R)\bowtie\Pi_(R) where ''Z'' = ''U'' − ''XY''. Intuitively, if a functional dependency ''X'' → ''Y'' holds in ''R'', then the relation can be safely split in two relations alongside the column ''X'' (which is a key for \Pi_(R)\bowtie\Pi_(R)) ensuring that when the two parts are joined back no data is lost, i.e. a functional dependency provides a simple way to construct a
lossless join decomposition In database design, a lossless join decomposition is a decomposition of a relation R into relations R_1, R_2 such that a natural join of the two smaller relations yields back the original relation. This is central in removing redundancy safely fr ...
of ''R'' in two smaller relations. This fact is sometimes called ''Heaths theorem''; it is one of the early results in database theory. Heath's theorem effectively says we can pull out the values of ''Y'' from the big relation ''R'' and store them into one, \Pi_(R), which has no value repetitions in the row for ''X'' and is effectively a
lookup table In computer science, a lookup table (LUT) is an array that replaces runtime computation with a simpler array indexing operation. The process is termed as "direct addressing" and LUTs differ from hash tables in a way that, to retrieve a value v w ...
for ''Y'' keyed by ''X'' and consequently has only one place to update the ''Y'' corresponding to each ''X'' unlike the "big" relation ''R'' where there are potentially many copies of each ''X'', each one with its copy of ''Y'' which need to be kept synchronized on updates. (This elimination of redundancy is an advantage in
OLTP In online transaction processing (OLTP), information systems typically facilitate and manage transaction-oriented applications. This is contrasted with online analytical processing. The term "transaction" can have two different meanings, both of w ...
contexts, where many changes are expected, but not so much in
OLAP Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, repor ...
contexts, which involve mostly queries.) Heath's decomposition leaves only ''X'' to act as a
foreign key A foreign key is a set of attributes in a table that refers to the primary key of another table. The foreign key links these two tables. Another way to put it: In the context of relational databases, a foreign key is a set of attributes subject to ...
in the remainder of the big table \Pi_(R). Functional dependencies however should not be confused with inclusion dependencies, which are the formalism for foreign keys; even though they are used for normalization, functional dependencies express constraints over one relation (schema), whereas inclusion dependencies express constraints between relation schemas in a
database schema The database schema is the structure of a database described in a formal language supported by the database management system (DBMS). The term "schema" refers to the organization of data as a blueprint of how the database is constructed (divi ...
. Furthermore, the two notions do not even intersect in the classification of dependencies: functional dependencies are equality-generating dependencies whereas inclusion dependencies are tuple-generating dependencies. Enforcing referential constraints after relation schema decomposition (normalization) requires a new formalism, i.e. inclusion dependencies. In the decomposition resulting from Heath's theorem, there is nothing preventing the insertion of tuples in \Pi_(R) having some value of ''X'' not found in \Pi_(R).


Normal forms

Normal forms are
database normalization Database normalization or database normalisation (see spelling differences) is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity ...
levels which determine the "goodness" of a table. Generally, the
third normal form Third normal form (3NF) is a database schema design approach for relational databases which uses normalizing principles to reduce the duplication of data, avoid data anomalies, ensure referential integrity, and simplify data management. It was ...
is considered to be a "good" standard for a relational database. Normalization aims to free the database from update, insertion and deletion anomalies. It also ensures that when a new value is introduced into the relation, it has minimal effect on the database, and thus minimal effect on the applications using the database.


Irreducible function depending set

A set S of functional dependencies is irreducible if the set has the following three properties: # Each right set of a functional dependency of S contains only one attribute. # Each left set of a functional dependency of S is irreducible. It means that reducing any one attribute from left set will change the content of S (S will lose some information). # Reducing any functional dependency will change the content of S. Sets of functional dependencies with these properties are also called ''canonical'' or ''minimal''. Finding such a set S of functional dependencies which is equivalent to some input set S' provided as input is called finding a ''minimal cover'' of S': this problem can be solved in polynomial time.


See also

*
Chase (algorithm) The chase is a simple fixed-point algorithm testing and enforcing implication of data dependencies in database systems. It plays important roles in database theory as well as in practice. It is used, directly or indirectly, on an everyday basis by ...
*
Inclusion dependency Referential integrity is a property of data stating that all its references are valid. In the context of relational databases, it requires that if a value of one attribute (column) of a relation (table) references a value of another attribute ( ...
*
Join dependency In database theory, a join dependency is a constraint on the set of legal relations over a database scheme. A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of ...
*
Multivalued dependency In database theory, a multivalued dependency is a full constraint between two sets of attributes in a relation. In contrast to the functional dependency, the multivalued dependency requires that certain tuples be present in a relation. Therefo ...
(MVD) *
Database normalization Database normalization or database normalisation (see spelling differences) is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity ...
*
First normal form First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if and only if no attribute domain has relations as elements. Or more informally, that no table column can have tables as values (or ...


References


Further readings

*


External links

* * * {{cite web , url=http://www.cs.sfu.ca/CC/354/zaiane/material/notes/Chapter6/node10.html , author=Osmar Zaiane , date=June 9, 1998 , work=CMPT 354 (Database Systems I) lecture notes , title=Chapter 6: Integrity constraints , publisher=
Simon Fraser University Simon Fraser University (SFU) is a public research university in British Columbia, Canada, with three campuses, all in Greater Vancouver: Burnaby (main campus), Surrey, and Vancouver. The main Burnaby campus on Burnaby Mountain, located ...
Department of Computing Science Data modeling