database theory Database theory encapsulates a broad range of topics related to the study and research of the theoretical realm of databases and database management systems. Theoretical aspects of data management include, among other areas, the foundations of q ...

, relational algebra is a theory that uses

algebraic structure In mathematics, an algebraic structure or algebraic system consists of a nonempty set ''A'' (called the underlying set, carrier set or domain), a collection of operations on ''A'' (typically binary operations such as addition and multiplicatio ...

s for modeling data and defining queries on it with well founded

semantics Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...

. The theory was introduced by

Edgar F. Codd Edgar Frank "Ted" Codd (19 August 1923 – 18 April 2003) was a British computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases and relational database ...

. The main application of relational algebra is to provide a theoretical foundation for

relational database A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...

s, particularly

query language A query language, also known as data query language or database query language (DQL), is a computer language used to make queries in databases and information systems. In database systems, query languages rely on strict theory to retrieve informa ...

s for such databases, chief among which is

SQL Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel") is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...

. Relational databases store tabular data represented as relations. Queries over relational databases often likewise return tabular data represented as relations. The main purpose of relational algebra is to define operators that transform one or more input relations to an output relation. Given that these operators accept relations as input and produce relations as output, they can be combined and used to express complex queries that transform multiple input relations (whose data are stored in the database) into a single output relation (the query results).

Unary operator In mathematics, a unary operation is an operation with only one operand, i.e. a single input. This is in contrast to ''binary operations'', which use two operands. An example is any function , where is a set; the function is a unary operatio ...

s accept a single relation as input. Examples include operators to filter certain attributes (columns) or

tuple In mathematics, a tuple is a finite sequence or ''ordered list'' of numbers or, more generally, mathematical objects, which are called the ''elements'' of the tuple. An -tuple is a tuple of elements, where is a non-negative integer. There is o ...

s (rows) from an input relation.

Binary operator In mathematics, a binary operation or dyadic operation is a rule for combining two elements (called operands) to produce another element. More formally, a binary operation is an operation of arity two. More specifically, a binary operation o ...

s accept two relations as input and combine them into a single output relation. For example, taking all tuples found in either relation ( union), removing tuples from the first relation found in the second relation ( difference), extending the tuples of the first relation with tuples in the second relation matching certain conditions, and so forth.

Introduction

Relational algebra received little attention outside of pure mathematics until the publication of E.F. Codd's relational model of data in 1970. Codd proposed such an algebra as a basis for database query languages. Relational algebra operates on homogeneous sets of tuples

S = \

where we commonly interpret ''m'' to be the number of rows of tuples in a table and ''n'' to be the number of columns. All entries in each column have the same

type Type may refer to: Science and technology Computing * Typing, producing text via a keyboard, typewriter, etc. * Data type, collection of values used for computations. * File type * TYPE (DOS command), a command to display contents of a file. * ...

. A relation also has a unique tuple called the header which gives each column a unique name or attribute inside the relation. Attributes are used in projections and selections.

Set operators

The relational algebra uses

set union In set theory, the union (denoted by ∪) of a collection of sets is the set of all elements in the collection. It is one of the fundamental operations through which sets can be combined and related to each other. A refers to a union of ze ...

set difference In set theory, the complement of a set , often denoted by A^c (or ), is the set of elements not in . When all elements in the universe, i.e. all elements under consideration, are considered to be members of a given set , the absolute complement ...

, and

Cartesian product In mathematics, specifically set theory, the Cartesian product of two sets and , denoted , is the set of all ordered pairs where is an element of and is an element of . In terms of set-builder notation, that is A\times B = \. A table c ...

from set theory, and adds additional constraints to these operators to create new ones. For set union and set difference, the two

relation Relation or relations may refer to: General uses * International relations, the study of interconnection of politics, economics, and law on a global level * Interpersonal relationship, association or acquaintance between two or more people * ...

s involved must be ''union-compatible''—that is, the two relations must have the same set of attributes. Because

set intersection In set theory, the intersection of two sets A and B, denoted by A \cap B, is the set containing all elements of A that also belong to B or equivalently, all elements of B that also belong to A. Notation and terminology Intersection is writt ...

is defined in terms of set union and set difference, the two relations involved in set intersection must also be union-compatible. For the Cartesian product to be defined, the two relations involved must have disjoint headers—that is, they must not have a common attribute name. In addition, the Cartesian product is defined differently from the one in

set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...

theory in the sense that tuples are considered to be "shallow" for the purposes of the operation. That is, the Cartesian product of a set of ''n''-tuples with a set of ''m''-tuples yields a set of "flattened" -tuples (whereas basic set theory would have prescribed a set of 2-tuples, each containing an ''n''-tuple and an ''m''-tuple). More formally, ''R'' × ''S'' is defined as follows:

R\times S:=\

The cardinality of the Cartesian product is the product of the cardinalities of its factors, that is, , ''R'' × ''S'', = , ''R'', × , ''S'', .

Projection

A projection () is a

unary operation In mathematics, a unary operation is an operation with only one operand, i.e. a single input. This is in contrast to ''binary operations'', which use two operands. An example is any function , where is a set; the function is a unary operation ...

written as

\Pi_( R )

where

a_1,\ldots,a_n

is a set of attribute names. The result of such projection is defined as the

that is obtained when all

s in ''R'' are restricted to the set

\

. Note: when implemented in

standard the "default projection" returns a

multiset In mathematics, a multiset (or bag, or mset) is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements. The number of instances given for each element is called the ''multiplicity'' of ...

instead of a set, and the projection to eliminate duplicate data is obtained by the addition of the DISTINCT keyword.

Selection

A generalized selection (σ) is a

written as

\sigma_\varphi(R)

where is a propositional formula that consists of

atom Atoms are the basic particles of the chemical elements. An atom consists of a atomic nucleus, nucleus of protons and generally neutrons, surrounded by an electromagnetically bound swarm of electrons. The chemical elements are distinguished fr ...

s as allowed in the normal selection and the logical operators ( and), ( or) and (

negation In logic, negation, also called the logical not or logical complement, is an operation (mathematics), operation that takes a Proposition (mathematics), proposition P to another proposition "not P", written \neg P, \mathord P, P^\prime or \over ...

). This selection selects all those

s in ''R'' for which holds. To obtain a listing of all friends or business associates in an address book, the selection might be written as

\sigma_( \text )

. The result would be a relation containing every attribute of every unique record where is true or where is true.

Rename

A rename (''ρ'') is a

written as

\rho_(R)

where the result is identical to ''R'' except that the ''b'' attribute in all tuples is renamed to an ''a'' attribute. This is commonly used to rename the attribute of a

for the purpose of a join. To rename the "isFriend" attribute to "isBusinessContact" in a relation,

\rho_ ( \text )

might be used. There is also the

\rho_(R)

notation, where ''R'' is renamed to ''x'' and the attributes

\

are renamed to

\

Joins and join-like operators

Common extensions

In practice the classical relational algebra described above is extended with various operations such as outer joins, aggregate functions and even transitive closure.

Outer joins

Whereas the result of a join (or inner join) consists of tuples formed by combining matching tuples in the two operands, an outer join contains those tuples and additionally some tuples formed by extending an unmatched tuple in one of the operands by "fill" values for each of the attributes of the other operand. Outer joins are not considered part of the classical relational algebra discussed so far. The operators defined in this section assume the existence of a ''null'' value, ''ω'', which we do not define, to be used for the fill values; in practice this corresponds to the

NULL Null may refer to: Science, technology, and mathematics Astronomy *Nuller, an optical tool using interferometry to block certain sources of light Computing *Null (SQL) (or NULL), a special marker and keyword in SQL indicating that a data value do ...

in SQL. In order to make subsequent selection operations on the resulting table meaningful, a semantic meaning needs to be assigned to nulls; in Codd's approach the propositional logic used by the selection is extended to a three-valued logic, although we elide those details in this article. Three outer join operators are defined: left outer join, right outer join, and full outer join. (The word "outer" is sometimes omitted.)

Left outer join

The left outer join (⟕) is written as ''R'' ⟕ ''S'' where ''R'' and ''S'' are

s. The result of the left outer join is the set of all combinations of tuples in ''R'' and ''S'' that are equal on their common attribute names, in addition (loosely speaking) to tuples in ''R'' that have no matching tuples in ''S''. For an example consider the tables ''Employee'' and ''Dept'' and their left outer join: In the resulting relation, tuples in ''S'' which have no common values in common attribute names with tuples in ''R'' take a ''null'' value, ''ω''. Since there are no tuples in ''Dept'' with a ''DeptName'' of ''Finance'' or ''Executive'', ''ω''s occur in the resulting relation where tuples in ''Employee'' have a ''DeptName'' of ''Finance'' or ''Executive''. Let ''r''₁, ''r''₂, ..., ''r''_''n'' be the attributes of the relation ''R'' and let be the singleton relation on the attributes that are ''unique'' to the relation ''S'' (those that are not attributes of ''R''). Then the left outer join can be described in terms of the natural join (and hence using basic operators) as follows: :

(R \bowtie S) \cup ((R - \pi_(R \bowtie S)) \times \)

Right outer join

The right outer join (⟖) behaves almost identically to the left outer join, but the roles of the tables are switched. The right outer join of

s ''R'' and ''S'' is written as ''R'' ⟖ ''S''. The result of the right outer join is the set of all combinations of tuples in ''R'' and ''S'' that are equal on their common attribute names, in addition to tuples in ''S'' that have no matching tuples in ''R''. For example, consider the tables ''Employee'' and ''Dept'' and their right outer join: In the resulting relation, tuples in ''R'' which have no common values in common attribute names with tuples in ''S'' take a ''null'' value, ''ω''. Since there are no tuples in ''Employee'' with a ''DeptName'' of ''Production'', ''ω''s occur in the Name and EmpId attributes of the resulting relation where tuples in ''Dept'' had ''DeptName'' of ''Production''. Let ''s''₁, ''s''₂, ..., ''s''_''n'' be the attributes of the relation ''S'' and let be the singleton relation on the attributes that are ''unique'' to the relation ''R'' (those that are not attributes of ''S''). Then, as with the left outer join, the right outer join can be simulated using the natural join as follows: :

(R \bowtie S) \cup (\ \times (S - \pi_(R \bowtie S)))

Full outer join

The outer join (⟗) or full outer join in effect combines the results of the left and right outer joins. The full outer join is written as ''R'' ⟗ ''S'' where ''R'' and ''S'' are

s. The result of the full outer join is the set of all combinations of tuples in ''R'' and ''S'' that are equal on their common attribute names, in addition to tuples in ''S'' that have no matching tuples in ''R'' and tuples in ''R'' that have no matching tuples in ''S'' in their common attribute names. For an example consider the tables ''Employee'' and ''Dept'' and their full outer join: In the resulting relation, tuples in ''R'' which have no common values in common attribute names with tuples in ''S'' take a ''null'' value, ''ω''. Tuples in ''S'' which have no common values in common attribute names with tuples in ''R'' also take a ''null'' value, ''ω''. The full outer join can be simulated using the left and right outer joins (and hence the natural join and set union) as follows: :''R'' ⟗ ''S'' = (''R'' ⟕ ''S'') ∪ (''R'' ⟖ ''S'')

Operations for domain computations

There is nothing in relational algebra introduced so far that would allow computations on the data domains (other than evaluation of propositional expressions involving equality). For example, it is not possible using only the algebra introduced so far to write an expression that would multiply the numbers from two columns, e.g. a unit price with a quantity to obtain a total price. Practical query languages have such facilities, e.g. the SQL SELECT allows arithmetic operations to define new columns in the result SELECT unit_price * quantity AS total_price FROM t, and a similar facility is provided more explicitly by Tutorial D's EXTEND keyword. In database theory, this is called extended projection.

Aggregation

Furthermore, computing various functions on a column, like the summing up of its elements, is also not possible using the relational algebra introduced so far. There are five

aggregate function In database management, an aggregate function or aggregation function is a function where multiple values are processed together to form a single summary statistic. Common aggregate functions include: * Average (i.e., arithmetic mean) * Count ...

s that are included with most relational database systems. These operations are Sum, Count, Average, Maximum and Minimum. In relational algebra the aggregation operation over a schema (''A''₁, ''A''₂, ... ''A''_''n'') is written as follows: :

G_1, G_2, \ldots, G_m\ g_\ (r)

where each ''A''_''j''', 1 ≤ ''j'' ≤ ''k'', is one of the original attributes ''A''_''i'', 1 ≤ ''i'' ≤ ''n''. The attributes preceding the ''g'' are grouping attributes, which function like a "group by" clause in SQL. Then there are an arbitrary number of aggregation functions applied to individual attributes. The operation is applied to an arbitrary relation ''r''. The grouping attributes are optional, and if they are not supplied, the aggregation functions are applied across the entire relation to which the operation is applied. Let's assume that we have a table named with three columns, namely and . We wish to find the maximum balance of each branch. This is accomplished by ''G''_Max()(). To find the highest balance of all accounts regardless of branch, we could simply write ''G''_Max()(). Grouping is often written as ''ɣ''_Max()() instead.

Transitive closure

Although relational algebra seems powerful enough for most practical purposes, there are some simple and natural operators on

s that cannot be expressed by relational algebra. One of them is the

transitive closure In mathematics, the transitive closure of a homogeneous binary relation on a set (mathematics), set is the smallest Relation (mathematics), relation on that contains and is Transitive relation, transitive. For finite sets, "smallest" can be ...

of a binary relation. Given a domain ''D'', let binary relation ''R'' be a subset of ''D''×''D''. The transitive closure ''R⁺'' of ''R'' is the smallest subset of ''D''×''D'' that contains ''R'' and satisfies the following condition: :

\forall x \forall y \forall z \left( (x,y) \in R^+ \wedge (y,z) \in R^+ \Rightarrow (x,z) \in R^+ \right)

It can be proved using the fact that there is no relational algebra expression ''E''(''R'') taking ''R'' as a variable argument that produces ''R''⁺. SQL however officially supports such fixpoint queries since 1999, and it had vendor-specific extensions in this direction well before that.

Use of algebraic properties for query optimization

Relational database management systems A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured form ...

often include a query optimizer which attempts to determine the most efficient way to execute a given query. Query optimizers enumerate possible

query plan A query plan (or query execution plan) is a sequence of steps used to access data in a SQL relational database management system. This is a specific case of the relational model concept of access plans. Since SQL is declarative, there are typical ...

s, estimate their cost, and pick the plan with the lowest estimated cost. If queries are represented by operators from relational algebra, the query optimizer can enumerate possible query plans by rewriting the initial query using the algebraic properties of these operators. Queries can be represented as a

tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only ...

, where * the internal nodes are operators, * leaves are

s, * subtrees are subexpressions. The primary goal of the query optimizer is to transform expression trees into equivalent expression trees, where the average size of the relations yielded by subexpressions in the tree is smaller than it was before the

optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...

. The secondary goal is to try to form common subexpressions within a single query, or if there is more than one query being evaluated at the same time, in all of those queries. The rationale behind the second goal is that it is enough to compute common subexpressions once, and the results can be used in all queries that contain that subexpression. Here are a set of rules that can be used in such transformations.

Selection

Rules about selection operators play the most important role in query optimization. Selection is an operator that very effectively decreases the number of rows in its operand, so if the selections in an expression tree are moved towards the leaves, the internal

s (yielded by subexpressions) will likely shrink.

Basic selection properties

Selection is

idempotent Idempotence (, ) is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence arises in a number of pl ...

(multiple applications of the same selection have no additional effect beyond the first one), and

commutative In mathematics, a binary operation is commutative if changing the order of the operands does not change the result. It is a fundamental property of many binary operations, and many mathematical proofs depend on it. Perhaps most familiar as a pr ...

(the order selections are applied in has no effect on the eventual result). #

\sigma_(R)=\sigma_\sigma_(R)\,\!

\sigma_\sigma_(R)=\sigma_\sigma_(R)\,\!

Breaking up selections with complex conditions

A selection whose condition is a conjunction of simpler conditions is equivalent to a sequence of selections with those same individual conditions, and selection whose condition is a

disjunction In logic, disjunction (also known as logical disjunction, logical or, logical addition, or inclusive disjunction) is a logical connective typically notated as \lor and read aloud as "or". For instance, the English language sentence "it is ...

is equivalent to a union of selections. These identities can be used to merge selections so that fewer selections need to be evaluated, or to split them so that the component selections may be moved or optimized separately. #

\sigma_(R)=\sigma_(\sigma_(R))=\sigma_(\sigma_(R))

\sigma_(R)=\sigma_(R)\cup\sigma_(R)

Selection and cross product

Cross product is the costliest operator to evaluate. If the input

s have ''N'' and ''M'' rows, the result will contain

NM

rows. Therefore, it is important to decrease the size of both operands before applying the cross product operator. This can be effectively done if the cross product is followed by a selection operator, e.g.

\sigma_(R \times P)

. Considering the definition of join, this is the most likely case. If the cross product is not followed by a selection operator, we can try to push down a selection from higher levels of the expression tree using the other selection rules. In the above case the condition ''A'' is broken up in to conditions ''B'', ''C'' and ''D'' using the split rules about complex selection conditions, so that

A = B \wedge C \wedge D

and ''B'' contains attributes only from ''R'', ''C'' contains attributes only from ''P'', and ''D'' contains the part of ''A'' that contains attributes from both ''R'' and ''P''. Note, that ''B'', ''C'' or ''D'' are possibly empty. Then the following holds: :

\sigma_(R \times P) = \sigma_(R \times P) = \sigma_(\sigma_(R) \times \sigma_(P))

Selection and set operators

Selection is distributive over the set difference, intersection, and union operators. The following three rules are used to push selection below set operations in the expression tree. For the set difference and the intersection operators, it is possible to apply the selection operator to just one of the operands following the transformation. This can be beneficial where one of the operands is small, and the overhead of evaluating the selection operator outweighs the benefits of using a smaller

as an operand. #

\sigma_(R\setminus P)=\sigma_(R)\setminus \sigma_(P) =\sigma_(R)\setminus P

\sigma_(R\cup P)=\sigma_(R)\cup\sigma_(P)

\sigma_(R\cap P)=\sigma_(R)\cap\sigma_(P)=\sigma_(R)\cap P=R\cap \sigma_(P)

Selection and projection

Selection commutes with projection if and only if the fields referenced in the selection condition are a subset of the fields in the projection. Performing selection before projection may be useful if the operand is a cross product or join. In other cases, if the selection condition is relatively expensive to compute, moving selection outside the projection may reduce the number of tuples which must be tested (since projection may produce fewer tuples due to the elimination of duplicates resulting from omitted fields). :

\pi_(\sigma_A( R )) = \sigma_A(\pi_( R ))\textA \subseteq \

Projection

Basic projection properties

Projection is idempotent, so that a series of (valid) projections is equivalent to the outermost projection. :

\pi_(\pi_(R)) = \pi_(R)\text\ \subseteq \

Projection and set operators

Projection is distributive over set union. :

\pi_(R \cup P) = \pi_(R) \cup \pi_(P). \,

Projection does not distribute over intersection and set difference. Counterexamples are given by: :

\pi_A(\ \cap \) = \emptyset

\pi_A(\) \cap \pi_A(\) = \

and :

\pi_A(\ \setminus \) = \

\pi_A(\) \setminus \pi_A(\) = \emptyset\,,

where ''b'' is assumed to be distinct from .

Rename

Basic rename properties

Successive renames of a variable can be collapsed into a single rename. Rename operations which have no variables in common can be arbitrarily reordered with respect to one another, which can be exploited to make successive renames adjacent so that they can be collapsed. #

\rho_(\rho_(R)) = \rho_(R)\,\!

\rho_(\rho_(R)) = \rho_(\rho_(R))\,\!

Rename and set operators

Rename is distributive over set difference, union, and intersection. #

\rho_(R \setminus P) = \rho_(R) \setminus \rho_(P)

\rho_(R \cup P) = \rho_(R) \cup \rho_(P)

\rho_(R \cap P) = \rho_(R) \cap \rho_(P)

Product and union

Cartesian product is distributive over union. #

(A \times B) \cup (A \times C) = A \times (B \cup C)

Implementations

The first query language to be based on Codd's algebra was Alpha, developed by Dr. Codd himself. Subsequently, ISBL was created, and this pioneering work has been acclaimed by many authorities as having shown the way to make Codd's idea into a useful language. Business System 12 was a short-lived industry-strength relational DBMS that followed the ISBL example. In 1998 Chris Date and

Hugh Darwen Hugh Darwen is a computer scientist who was an employee of IBM United Kingdom from 1967. to 2004, and has been involved in the development of the relational model. Work From 1978 to 1982 he was a chief architect on Business System 12, a dat ...

proposed a language called Tutorial D intended for use in teaching relational database theory, and its query language also draws on ISBL's ideas. Rel is an implementation of Tutorial D. Bmg is an implementation of relational algebra in Ruby which closely follows the principles of Tutorial D and ''The Third Manifesto''. Even the query language of

is loosely based on a relational algebra, though the operands in SQL (

table Table may refer to: * Table (database), how the table data arrangement is used within the databases * Table (furniture), a piece of furniture with a flat surface and one or more legs * Table (information), a data arrangement with rows and column ...

s) are not exactly

s and several useful theorems about the relational algebra do not hold in the SQL counterpart (arguably to the detriment of optimisers and/or users). The SQL table model is a bag (

), rather than a set. For example, the expression

(R \cup S) \setminus T = (R \setminus T) \cup (S \setminus T)

is a theorem for relational algebra on sets, but not for relational algebra on bags.

Notes

References

External links

RAT Relational Algebra Translator
Free software to convert relational algebra to SQL

- An introduction to how database systems process relational algebra

– A quick tutorial to adapt SQL queries into relational algebra
Relational – A graphic implementation of the relational algebra

Query Optimization
This paper is an introduction into the use of the relational algebra in optimizing queries, and includes numerous citations for more in-depth study.

* ttps://centaurialpha.github.io/pireal/index.html Pireal – An experimental educational tool for working with Relational Algebra
DES – An educational tool for working with Relational Algebra and other formal languages

RelaX - Relational Algebra Calculator
(open-source software available as an online service without registration)
RA: A Relational Algebra Interpreter

Translating SQL to Relational Algebra
{{Databases Relational model Database management systems

Introduction

Set operators

Projection

Selection

Rename

Joins and join-like operators

Common extensions

Outer joins

Left outer join

Right outer join

Full outer join

Operations for domain computations

Aggregation

Transitive closure

Use of algebraic properties for query optimization

Selection

Basic selection properties

Breaking up selections with complex conditions

Selection and cross product

Selection and set operators

Selection and projection

Projection

Basic projection properties

Projection and set operators

Rename

Basic rename properties

Rename and set operators

Product and union

Implementations

See also

Notes

References

Further reading

External links