Tuple calculus is a
calculus
Calculus is the mathematics, mathematical study of continuous change, in the same way that geometry is the study of shape, and algebra is the study of generalizations of arithmetic operations.
Originally called infinitesimal calculus or "the ...
that was created and introduced by
Edgar F. Codd
Edgar Frank "Ted" Codd (19 August 1923 – 18 April 2003) was a British computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases and relational database ...
as part of the
relational model
The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data are represented in terms of t ...
, in order to provide a
declarative database-query language for data manipulation in this
data model
A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be ...
. It formed the inspiration for the database-query languages
QUEL and
SQL
Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel")
is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
, of which the latter, although far less faithful to the original relational model and calculus, is now the de facto standard database-query language; a dialect of SQL is used by nearly every
relational-database-management system. Michel Lacroix and Alain Pirotte proposed
domain calculus, which is closer to
first-order logic
First-order logic, also called predicate logic, predicate calculus, or quantificational logic, is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantified variables over ...
and together with Codd showed that both of these calculi (as well as
relational algebra
In database theory, relational algebra is a theory that uses algebraic structures for modeling data and defining queries on it with well founded semantics (computer science), semantics. The theory was introduced by Edgar F. Codd.
The main applica ...
) are equivalent in
expressive power. Subsequently, query languages for the relational model were called ''relationally complete'' if they could express at least all of these queries.
Definition
Relational database
Since the calculus is a
query language
A query language, also known as data query language or database query language (DQL), is a computer language used to make queries in databases and information systems. In database systems, query languages rely on strict theory to retrieve informa ...
for
relational database
A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970.
A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
s we first have to define a relational database. The basic relational building block is the
domain (somewhat similar, but not equal to, a
data type
In computer science and computer programming, a data type (or simply type) is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these ...
). A
tuple
In mathematics, a tuple is a finite sequence or ''ordered list'' of numbers or, more generally, mathematical objects, which are called the ''elements'' of the tuple. An -tuple is a tuple of elements, where is a non-negative integer. There is o ...
is a finite sequence of
attributes, which are
ordered pair
In mathematics, an ordered pair, denoted (''a'', ''b''), is a pair of objects in which their order is significant. The ordered pair (''a'', ''b'') is different from the ordered pair (''b'', ''a''), unless ''a'' = ''b''. In contrast, the '' unord ...
s of domains and values. A
relation
Relation or relations may refer to:
General uses
* International relations, the study of interconnection of politics, economics, and law on a global level
* Interpersonal relationship, association or acquaintance between two or more people
* ...
is a set of (compatible) tuples. Although these relational concepts are mathematically defined, those definitions map loosely to traditional database concepts. A
table
Table may refer to:
* Table (database), how the table data arrangement is used within the databases
* Table (furniture), a piece of furniture with a flat surface and one or more legs
* Table (information), a data arrangement with rows and column ...
is an accepted visual representation of a relation; a tuple is similar to the concept of a
row.
We first assume the existence of a set ''C'' of column names, examples of which are "name", "author", "address", etcetera. We define ''headers'' as finite subsets of ''C''. A ''relational database schema'' is defined as a
tuple
In mathematics, a tuple is a finite sequence or ''ordered list'' of numbers or, more generally, mathematical objects, which are called the ''elements'' of the tuple. An -tuple is a tuple of elements, where is a non-negative integer. There is o ...
''S'' = (''D'', ''R'', ''h'') where ''D'' is the domain of atomic values (see
relational model
The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data are represented in terms of t ...
for more on the notions of ''domain'' and ''atomic value''), ''R'' is a
finite set
In mathematics, particularly set theory, a finite set is a set that has a finite number of elements. Informally, a finite set is a set which one could in principle count and finish counting. For example,
is a finite set with five elements. Th ...
of relation names, and
:''h'' : ''R'' → 2
''C''
a
function that associates a header with each relation name in ''R''. (Note that this is a simplification from the full relational model where there is more than one domain and a header is not just a set of column names but also maps these column names to a domain.) Given a domain ''D'' we define a ''tuple'' over ''D'' as a
partial function
In mathematics, a partial function from a set to a set is a function from a subset of (possibly the whole itself) to . The subset , that is, the '' domain'' of viewed as a function, is called the domain of definition or natural domain ...
that maps some column names to an atomic value in ''D''. An example would be (name : "Harry", age : 25).
:''t'' : ''C'' ⇸ ''D''
The set of all tuples over ''D'' is denoted as ''T''
''D''. The subset of ''C'' for which a tuple ''t'' is defined is called the ''domain'' of ''t'' (not to be confused with the domain in the schema) and denoted as ''dom''(''t'').
Finally we define a ''relational database'' given a schema ''S'' = (''D'', ''R'', ''h'') as a function
:''db'' : ''R'' → 2
''T''''D''
that maps the relation names in ''R'' to finite subsets of ''T''
''D'', such that for every relation name ''r'' in ''R'' and tuple ''t'' in ''db''(''r'') it holds that
:''dom''(''t'') = ''h''(''r'').
The latter requirement simply says that all the tuples in a relation should contain the same column names, namely those defined for it in the schema.
Atoms
For the construction of the formulas we will assume an infinite set ''V'' of tuple variables. The formulas are defined given a database schema ''S'' = (''D'', ''R'', ''h'') and a partial function ''type'' : ''V'' ⇸ 2
''C'', called at ''type assignment'', that assigns headers to some tuple variables. We then define the ''set of
atomic formula
In mathematical logic, an atomic formula (also known as an atom or a prime formula) is a formula with no deeper propositional structure, that is, a formula that contains no logical connectives or equivalently a formula that has no strict subformu ...
s'' ''A''
'S'',''type''with the following rules:
# if ''v'' and ''w'' in ''V'', ''a'' in ''type''(''v'') and ''b'' in ''type''(''w'') then the formula ''v''.''a'' = ''w''.''b'' is in ''A''
'S'',''type''
# if ''v'' in ''V'', ''a'' in ''type''(''v'') and ''k'' denotes a value in ''D'' then the formula ''v''.''a'' = ''k'' is in ''A''
'S'',''type'' and
# if ''v'' in ''V'', ''r'' in ''R'' and ''type''(''v'') = ''h''(''r'') then the formula ''r''(''v'') is in ''A''
'S'',''type''
Examples of atoms are:
* (''t''.age = ''s''.age) — ''t'' has an age attribute and ''s'' has an age attribute with the same value
* (''t''.name = "Codd") — tuple ''t'' has a name attribute and its value is "Codd"
* Book(''t'') — tuple ''t'' is present in relation Book.
The formal semantics of such atoms is defined given a database ''db'' over ''S'' and a tuple variable binding ''val'' : ''V'' → ''T''
''D'' that maps tuple variables to tuples over the domain in ''S'':
# ''v''.''a'' = ''w''.''b'' is true if and only if ''val''(''v'')(''a'') = ''val''(''w'')(''b'')
# ''v''.''a'' = ''k'' is true if and only if ''val''(''v'')(''a'') = ''k''
# ''r''(''v'') is true if and only if ''val''(''v'') is in ''db''(''r'')
Formulas
The atoms can be combined into formulas, as is usual in first-order logic, with the logical operators ∧ (and), ∨ (or) and ¬ (not), and we can use the existential quantifier (∃) and the universal quantifier (∀) to bind the variables. We define the ''set of formulas'' ''F''
'S'',''type''inductively with the following rules:
# every atom in ''A''
'S'',''type''is also in ''F''
'S'',''type''# if ''f''
1 and ''f''
2 are in ''F''
'S'',''type''then the formula ''f''
1 ∧ ''f''
2 is also in ''F''
'S'',''type''# if ''f''
1 and ''f''
2 are in ''F''
'S'',''type''then the formula ''f''
1 ∨ ''f''
2 is also in ''F''
'S'',''type''# if ''f'' is in ''F''
'S'',''type''then the formula ¬ ''f'' is also in ''F''
'S'',''type''# if ''v'' in ''V'', ''H'' a header and ''f'' a formula in ''F''
[''v''->''H''">'S'',''type''[''v''->''H''/sub>">'v''->''H''.html" ;"title="'S'',''type''
[''v''->''H''">'S'',''type''[''v''->''H''/sub>then the formula ∃ ''v'' : ''H'' ( ''f'' ) is also in ''F'' 'S'',''type'' where ''type''[''v''->''H''] denotes the function that is equal to ''type'' except that it maps ''v'' to ''H'',
# if ''v'' in ''V'', ''H'' a header and ''f'' a formula in ''F'' [''v''->''H''">'S'',''type''[''v''->''H''/sub>">'v''->''H''.html" ;"title="'S'',''type''[''v''->''H''">'S'',''type''[''v''->''H''/sub>then the formula ∀ ''v'' : ''H'' ( ''f'' ) is also in ''F'' 'S'',''type''
Examples of formulas:
* ''t''.name = "C. J. Date" ∨ ''t''.name = "H. Darwen"
* Book(''t'') ∨ Magazine(''t'')
* ∀ ''t'' : ( ¬ ( Book(''t'') ∧ ''t''.author = "C. J. Date" ∧ ¬ ( ''t''.subject = "relational model")))
Note that the last formula states that all books that are written by C. J. Date have as their subject the relational model. As usual we omit brackets if this causes no ambiguity about the semantics of the formula.
We will assume that the quantifiers quantify over the universe of all tuples over the domain in the schema. This leads to the following formal semantics for formulas given a database ''db'' over ''S'' and a tuple variable binding ''val'' : ''V'' -> ''T''''D'':
# ''f''1 ∧ ''f''2 is true if and only if ''f''1 is true and ''f''2 is true,
# ''f''1 ∨ ''f''2 is true if and only if ''f''1 is true or ''f''2 is true or both are true,
# ¬ ''f'' is true if and only if ''f'' is not true,
# ∃ ''v'' : ''H'' ( ''f'' ) is true if and only if there is a tuple ''t'' over ''D'' such that ''dom''(''t'') = ''H'' and the formula ''f'' is true for ''val''[''v''->''t''], and
# ∀ ''v'' : ''H'' ( ''f'' ) is true if and only if for all tuples ''t'' over ''D'' such that ''dom''(''t'') = ''H'' the formula ''f'' is true for ''val''[''v''->''t''].
Queries
Finally we define what a query expression looks like given a schema ''S'' = (''D'', ''R'', ''h''):
:
where ''v'' is a tuple variable, ''H'' a header and ''f''(''v'') a formula in ''F'' 'S'',''type''where ''type'' = and with ''v'' as its only free variable. The result of such a query for a given database ''db'' over ''S'' is the set of all tuples ''t'' over ''D'' with ''dom''(''t'') = ''H'' such that ''f'' is true for ''db'' and ''val'' = .
Examples of query expressions are:
*
*
Semantic and syntactic restriction
Domain-independent queries
Because the semantics of the quantifiers is such that they quantify over all the tuples over the domain in the schema it can be that a query may return a different result for a certain database if another schema is presumed. For example, consider the two schemas ''S1'' = ( ''D1'', ''R'', ''h'' ) and ''S2'' = ( ''D2'', ''R'', ''h'' ) with domains ''D1'' = , ''D2'' = , relation names ''R'' = and headers ''h'' = . Both schemas have a common instance:
: ''db'' =
If we consider the following query expression
:
then its result on ''db'' is either under ''S1'' or under ''S2''. It will also be clear that if we take the domain to be an infinite set, then the result of the query will also be infinite. To solve these problems we will restrict our attention to those queries that are ''domain independent'', i.e., the queries that return the same result for a database under all of its schemas.
An interesting property of these queries is that if we assume that the tuple variables range over tuples over the so-called ''active domain'' of the database, which is the subset of the domain that occurs in at least one tuple in the database or in the query expression, then the semantics of the query expressions does not change. In fact, in many definitions of the tuple calculus this is how the semantics of the quantifiers is defined, which makes all queries by definition domain independent.
Safe queries
In order to limit the query expressions such that they express only domain-independent queries a syntactical notion of ''safe query'' is usually introduced. To determine whether a query expression is safe we will derive two types of information from a query. The first is whether a variable-column pair ''t''.''a'' is ''bound'' to the column of a relation or a constant, and the second is whether two variable-column pairs are directly or indirectly equated (denoted ''t''.''v'' ''s''.''w'').
For deriving boundedness we introduce the following reasoning rules:
# in " ''v''.''a'' = ''w''.''b'' " no variable-column pair is bound,
# in " ''v''.''a'' = ''k'' " the variable-column pair ''v''.''a'' is bound,
# in " ''r''(''v'') " all pairs ''v''.''a'' are bound for ''a'' in ''type''(''v''),
# in " ''f''1 ∧ ''f''2 " all pairs are bound that are bound either in ''f''1 or in ''f''2,
# in " ''f''1 ∨ ''f''2 " all pairs are bound that are bound both in ''f''1 and in ''f''2,
# in " ¬ ''f'' " no pairs are bound,
# in " ∃ ''v'' : ''H'' ( ''f'' ) " a pair ''w''.''a'' is bound if it is bound in ''f'' and ''w'' <> ''v'', and
# in " ∀ ''v'' : ''H'' ( ''f'' ) " a pair ''w''.''a'' is bound if it is bound in ''f'' and ''w'' <> ''v''.
For deriving equatedness we introduce the following reasoning rules (next to the usual reasoning rules for equivalence relations: reflexivity, symmetry and transitivity):
# in " ''v''.''a'' = ''w''.''b'' " it holds that ''v''.''a'' ''w''.''b'',
# in " ''v''.''a'' = ''k'' " no pairs are equated,
# in " ''r''(''v'') " no pairs are equated,
# in " ''f''1 ∧ ''f''2 " it holds that ''v''.''a'' ''w''.''b'' if it holds either in ''f''1 or in ''f''2,
# in " ''f''1 ∨ ''f''2 " it holds that ''v''.''a'' ''w''.''b'' if it holds both in ''f''1 and in ''f''2,
# in " ¬ ''f'' " no pairs are equated,
# in " ∃ ''v'' : ''H'' ( ''f'' ) " it holds that ''w''.''a'' ''x''.''b'' if it holds in ''f'' and ''w''<>''v'' and ''x''<>''v'', and
# in " ∀ ''v'' : ''H'' ( ''f'' ) " it holds that ''w''.''a'' ''x''.''b'' if it holds in ''f'' and ''w''<>''v'' and ''x''<>''v''.
We then say that a query expression is ''safe'' if
* for every column name ''a'' in ''H'' we can derive that ''v''.''a'' is equated with a bound pair in ''f'',
* for every subexpression of ''f'' of the form " ∀ ''w'' : ''G'' ( ''g'' ) " we can derive that for every column name ''a'' in ''G'' we can derive that ''w''.''a'' is equated with a bound pair in ''g'', and
* for every subexpression of ''f'' of the form " ∃ ''w'' : ''G'' ( ''g'' ) " we can derive that for every column name ''a'' in ''G'' we can derive that ''w''.''a'' is equated with a bound pair in ''g''.
The restriction to safe query expressions does not limit the expressiveness since all domain-independent queries that could be expressed can also be expressed by a safe query expression. This can be proven by showing that for a schema ''S'' = (''D'', ''R'', ''h''), a given set ''K'' of constants in the query expression, a tuple variable ''v'' and a header ''H'' we can construct a safe formula for every pair ''v''.''a'' with ''a'' in ''H'' that states that its value is in the active domain. For example, assume that ''K''=, ''R''= and ''h'' = then the corresponding safe formula for ''v''.b is:
: ''v''.b = 1 ∨ ''v''.b = 2 ∨ ∃ ''w'' ( r(w) ∧ ( ''v''.b = ''w''.a ∨ ''v''.b = ''w''.b ) )
This formula, then, can be used to rewrite any unsafe query expression to an equivalent safe query expression by adding such a formula for every variable ''v'' and column name ''a'' in its type where it is used in the expression. Effectively this means that we let all variables range over the active domain, which, as was already explained, does not change the semantics if the expressed query is domain independent.
Systems
DES – An educational tool for working with Tuple Relational Calculus and other formal languages
WinRDBI – An educational tool for working with Tuple Relational Calculus and other formal languages
See also
* Relational algebra
In database theory, relational algebra is a theory that uses algebraic structures for modeling data and defining queries on it with well founded semantics (computer science), semantics. The theory was introduced by Edgar F. Codd.
The main applica ...
* Relational calculus
The relational calculus consists of two calculi, the tuple relational calculus and the domain relational calculus, that is part of the relational model for databases and provide a declarative way to specify database queries. The raison d'être ...
** Domain relational calculus (DRC)
References
*{{cite journal , last1=Codd , first1=E. F. , author-link=Edgar F. Codd , title=A relational model of data for large shared data banks , journal=Communications of the ACM
''Communications of the ACM'' (''CACM'') is the monthly journal of the Association for Computing Machinery (ACM).
History
It was established in 1958, with Saul Rosen as its first managing editor. It is sent to all ACM members.
Articles are i ...
, date=June 1970 , volume=13 , issue=6 , pages=377–387 , doi=10.1145/362384.362685
Relational model
Logical calculi
de:Kalkül (Datenbank)