Union Of Conjunctive Query
   HOME

TheInfoList



OR:

In
database theory Database theory encapsulates a broad range of topics related to the study and research of the theoretical realm of databases and database management systems. Theoretical aspects of data management include, among other areas, the foundations of q ...
, a conjunctive query is a restricted form of first-order queries using the
logical conjunction In logic, mathematics and linguistics, ''and'' (\wedge) is the Truth function, truth-functional operator of conjunction or logical conjunction. The logical connective of this operator is typically represented as \wedge or \& or K (prefix) or ...
operator. Many first-order queries can be written as conjunctive queries. In particular, a large part of queries issued on
relational database A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
s can be expressed in this way. Conjunctive queries also have a number of desirable theoretical properties that larger classes of queries (e.g., the
relational algebra In database theory, relational algebra is a theory that uses algebraic structures for modeling data and defining queries on it with well founded semantics (computer science), semantics. The theory was introduced by Edgar F. Codd. The main applica ...
queries) do not share.


Definition

The conjunctive queries are the fragment of (domain independent)
first-order logic First-order logic, also called predicate logic, predicate calculus, or quantificational logic, is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantified variables over ...
given by the set of formulae that can be constructed from
atomic formula In mathematical logic, an atomic formula (also known as an atom or a prime formula) is a formula with no deeper propositional structure, that is, a formula that contains no logical connectives or equivalently a formula that has no strict subformu ...
e using conjunction ∧ and
existential quantification Existentialism is a family of philosophy, philosophical views and inquiry that explore the human individual's struggle to lead an Authenticity (philosophy), authentic life despite the apparent Absurdity#The Absurd, absurdity or incomprehensibili ...
∃, but not using
disjunction In logic, disjunction (also known as logical disjunction, logical or, logical addition, or inclusive disjunction) is a logical connective typically notated as \lor and read aloud as "or". For instance, the English language sentence "it is ...
∨,
negation In logic, negation, also called the logical not or logical complement, is an operation (mathematics), operation that takes a Proposition (mathematics), proposition P to another proposition "not P", written \neg P, \mathord P, P^\prime or \over ...
¬, or
universal quantification In mathematical logic, a universal quantification is a type of quantifier, a logical constant which is interpreted as "given any", "for all", "for every", or "given an arbitrary element". It expresses that a predicate can be satisfied by e ...
∀. Each such formula can be rewritten (efficiently) into an equivalent formula in
prenex normal form A formula of the predicate calculus is in prenex normal form (PNF) if it is written as a string of quantifiers and bound variables, called the prefix, followed by a quantifier-free part, called the matrix. Together with the normal forms in propos ...
, thus this form is usually simply assumed. Thus conjunctive queries are of the following general form: :(x_1, \ldots, x_k).\exists x_, \ldots x_m. A_1 \wedge \ldots \wedge A_r, with the
free variables In mathematics, and in other disciplines involving formal languages, including mathematical logic and computer science, a variable may be said to be either free or bound. Some older books use the terms real variable and apparent variable for f ...
x_1, \ldots, x_k being called distinguished variables, and the
bound variables In mathematics, and in other disciplines involving formal languages, including mathematical logic and computer science, a variable may be said to be either free or bound. Some older books use the terms real variable and apparent variable for f ...
x_, \ldots, x_m being called undistinguished variables. A_1, \ldots, A_r are
atomic formula In mathematical logic, an atomic formula (also known as an atom or a prime formula) is a formula with no deeper propositional structure, that is, a formula that contains no logical connectives or equivalently a formula that has no strict subformu ...
e. As an example of why the restriction to domain independent first-order logic is important, consider x_1.\exists x_2. R(x_2), which is not domain independent; see
Codd's theorem Codd's theorem states that relational algebra and the domain-independent relational calculus queries, two well-known foundational query languages for the relational model, are precisely equivalent in expressive power. That is, a database query can ...
. This formula cannot be implemented in the select-project-join fragment of relational algebra, and hence should not be considered a conjunctive query. Conjunctive queries can express a large proportion of queries that are frequently issued on
relational database A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
s. To give an example, imagine a relational database for storing information about students, their address, the courses they take and their gender. Finding all male students and their addresses who attend a course that is also attended by a female student is expressed by the following conjunctive query: (student, address) . ∃ (student2, course) . attends(student, course) ∧ gender(student, 'male') ∧ attends(student2, course) ∧ gender(student2, 'female') ∧ lives(student, address) Note that since the only entity of interest is the male student and his address, these are the only distinguished variables, while the variables course, student2 are only existentially quantified, i.e. undistinguished.


Fragments

Conjunctive queries without distinguished variables are called boolean conjunctive queries. Conjunctive queries where all variables are distinguished (and no variables are bound) are called equi-join queries, because they are the equivalent, in the
relational calculus The relational calculus consists of two calculi, the tuple relational calculus and the domain relational calculus, that is part of the relational model for databases and provide a declarative way to specify database queries. The raison d'être ...
, of the
equi-join A join clause in the Structured Query Language (SQL) combines columns from one or more tables into a new table. The operation corresponds to a join operation in relational algebra. Informally, a join stitches two tables and puts on the same row ...
queries in the
relational algebra In database theory, relational algebra is a theory that uses algebraic structures for modeling data and defining queries on it with well founded semantics (computer science), semantics. The theory was introduced by Edgar F. Codd. The main applica ...
(when selecting all columns of the result).


Relationship to other query languages

Conjunctive queries also correspond to select-project-join queries in
relational algebra In database theory, relational algebra is a theory that uses algebraic structures for modeling data and defining queries on it with well founded semantics (computer science), semantics. The theory was introduced by Edgar F. Codd. The main applica ...
(i.e., relational algebra queries that do not use the operations union or difference) and to select-from-where queries in
SQL Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel") is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
in which the where-condition uses exclusively conjunctions of atomic equality conditions, i.e. conditions constructed from column names and constants using no comparison operators other than "=", combined using "and". Notably, this excludes the use of aggregation and subqueries. For example, the above query can be written as an SQL query of the conjunctive query fragment as select l.student, l.address from attends a1, gender g1, attends a2, gender g2, lives l where a1.student = g1.student and a2.student = g2.student and l.student = g1.student and a1.course = a2.course and g1.gender = 'male' and g2.gender = 'female';


Datalog

Besides their logical notation, conjunctive queries can also be written as
Datalog Datalog is a declarative logic programming language. While it is syntactically a subset of Prolog, Datalog generally uses a bottom-up rather than top-down evaluation model. This difference yields significantly different behavior and properties ...
rules. Many authors in fact prefer the following Datalog notation for the query above: result(student, address) :- attends(student, course), gender(student, male), attends(student2, course), gender(student2, female), lives(student, address). Although there are no quantifiers in this notation, variables appearing in the head of the rule are still implicitly
universally quantified In mathematical logic, a universal quantification is a type of quantifier, a logical constant which is interpreted as "given any", "for all", "for every", or "given an arbitrary element". It expresses that a predicate can be satisfied by ev ...
, while variables only appearing in the body of the rule are still implicitly existentially quantified. While any conjunctive query can be written as a Datalog rule, not every Datalog program can be written as a conjunctive query. In fact, only single rules over extensional predicate symbols can be easily rewritten as an equivalent conjunctive query. The problem of deciding whether for a given Datalog program there is an equivalent nonrecursive program (corresponding to a positive relational algebra query, or, equivalently, a formula of positive existential
first-order logic First-order logic, also called predicate logic, predicate calculus, or quantificational logic, is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantified variables over ...
, or, as a special case, a conjunctive query) is known as the Datalog boundedness problem and is undecidable.


Extensions

Extensions of conjunctive queries capturing more expressive power include: * unions of conjunctive queries, which are equivalent to positive (i.e.,
negation In logic, negation, also called the logical not or logical complement, is an operation (mathematics), operation that takes a Proposition (mathematics), proposition P to another proposition "not P", written \neg P, \mathord P, P^\prime or \over ...
-free)
relational algebra In database theory, relational algebra is a theory that uses algebraic structures for modeling data and defining queries on it with well founded semantics (computer science), semantics. The theory was introduced by Edgar F. Codd. The main applica ...
* conjunctive queries extended by union and
negation In logic, negation, also called the logical not or logical complement, is an operation (mathematics), operation that takes a Proposition (mathematics), proposition P to another proposition "not P", written \neg P, \mathord P, P^\prime or \over ...
, which by
Codd's theorem Codd's theorem states that relational algebra and the domain-independent relational calculus queries, two well-known foundational query languages for the relational model, are precisely equivalent in expressive power. That is, a database query can ...
correspond to
relational algebra In database theory, relational algebra is a theory that uses algebraic structures for modeling data and defining queries on it with well founded semantics (computer science), semantics. The theory was introduced by Edgar F. Codd. The main applica ...
and
first-order logic First-order logic, also called predicate logic, predicate calculus, or quantificational logic, is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantified variables over ...
* conjunctive queries with built-in predicates, e.g., arithmetic predicates * conjunctive queries with
aggregate function In database management, an aggregate function or aggregation function is a function where multiple values are processed together to form a single summary statistic. Common aggregate functions include: * Average (i.e., arithmetic mean) * Count ...
s. The formal study of all of these extensions is justified by their application in
relational databases A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured form ...
and is in the realm of
database theory Database theory encapsulates a broad range of topics related to the study and research of the theoretical realm of databases and database management systems. Theoretical aspects of data management include, among other areas, the foundations of q ...
.


Complexity

For the study of the
computational complexity In computer science, the computational complexity or simply complexity of an algorithm is the amount of resources required to run it. Particular focus is given to computation time (generally measured by the number of needed elementary operations ...
of evaluating conjunctive queries, two problems have to be distinguished. The first is the problem of evaluating a conjunctive query on a
relational database A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
where both the query and the database are considered part of the input. The complexity of this problem is usually referred to as combined complexity, while the complexity of the problem of evaluating a query on a relational database, where the query is assumed fixed, is called data complexity. Conjunctive queries are
NP-complete In computational complexity theory, NP-complete problems are the hardest of the problems to which ''solutions'' can be verified ''quickly''. Somewhat more precisely, a problem is NP-complete when: # It is a decision problem, meaning that for any ...
with respect to
combined complexity In database theory, the query evaluation problem is the problem of determining the answers to a query on a database. Research in database theory aims at determining the computational complexity of answering different kinds of queries over datab ...
, while the data complexity of conjunctive queries is very low, in the parallel complexity class AC0, which is contained in
LOGSPACE In computational complexity theory, L (also known as LSPACE, LOGSPACE or DLOGSPACE) is the complexity class containing decision problems that can be solved by a deterministic Turing machine using a logarithmic amount of writable memory space. Fo ...
and thus in
polynomial time In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations p ...
. The
NP-hard In computational complexity theory, a computational problem ''H'' is called NP-hard if, for every problem ''L'' which can be solved in non-deterministic polynomial-time, there is a polynomial-time reduction from ''L'' to ''H''. That is, assumi ...
ness of conjunctive queries may appear surprising, since
relational algebra In database theory, relational algebra is a theory that uses algebraic structures for modeling data and defining queries on it with well founded semantics (computer science), semantics. The theory was introduced by Edgar F. Codd. The main applica ...
and
SQL Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel") is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
strictly subsume the conjunctive queries and are thus at least as hard (in fact, relational algebra is
PSPACE In computational complexity theory, PSPACE is the set of all decision problems that can be solved by a Turing machine using a polynomial amount of space. Formal definition If we denote by SPACE(''f''(''n'')), the set of all problems that can ...
-complete with respect to combined complexity and is therefore even harder under widely held complexity-theoretic assumptions). However, in the usual application scenario, databases are large, while queries are very small, and the data complexity model may be appropriate for studying and describing their difficulty. The problem of listing all answers to a non-Boolean conjunctive query has been studied in the context of
enumeration algorithm In computer science, an enumeration algorithm is an algorithm that enumerates the answers to a computational problem. Formally, such an algorithm applies to problems that take an input and produce a list of solutions, similarly to function problem ...
s, with a characterization (under some
computational hardness assumption In computational complexity theory, a computational hardness assumption is the hypothesis that a particular problem cannot be solved efficiently (where ''efficiently'' typically means "in polynomial time"). It is not known how to prove (unconditi ...
s) of the queries for which enumeration can be performed with
linear time In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations ...
preprocessing and constant delay between each solution. Specifically, these are the acyclic conjunctive queries which also satisfy a ''free-connex'' condition.


Formal properties

Conjunctive queries are one of the great success stories of
database theory Database theory encapsulates a broad range of topics related to the study and research of the theoretical realm of databases and database management systems. Theoretical aspects of data management include, among other areas, the foundations of q ...
in that many interesting problems that are computationally hard or undecidable for larger classes of queries are feasible for conjunctive queries.
Serge Abiteboul Serge Joseph Abiteboul (born 25 August 1953 in Paris, France) is a French computer scientist working in the areas of data management, database theory, and finite model theory. Education The son of two hardware store owners, Abiteboul attended hig ...
, Richard B. Hull,
Victor Vianu Victor Vianu is a computer scientist, a professor of computer science and engineering at the University of California, San Diego.
: Foundations of Databases. Addison-Wesley, 1995.
For example, consider the query containment problem. We write R \subseteq S for two
database relation In database theory, a relation, as originally defined by Edgar F. Codd, E. F. Codd, is a Set (mathematics), set of tuples (d1,d2,...,dn), where each element dj is a member of Dj, a data domain. Codd's original definition notwithstanding, and contr ...
s R, S of the same
schema Schema may refer to: Science and technology * SCHEMA (bioinformatics), an algorithm used in protein engineering * Schema (genetic algorithms), a set of programs or bit strings that have some genotypic similarity * Schema.org, a web markup vocab ...
if and only if each tuple occurring in R also occurs in S. Given a query Q and a
relational database A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
instance I, we write the result relation of evaluating the query on the instance simply as Q(I). Given two queries Q_1 and Q_2 and a
database schema The database schema is the structure of a database described in a formal language supported typically by a relational database management system (RDBMS). The term "wikt:schema, schema" refers to the organization of data as a blueprint of how the ...
, the query containment problem is the problem of deciding whether for all possible database instances I over the input database schema, Q_1(I) \subseteq Q_2(I). The main application of query containment is in query optimization: Deciding whether two queries are equivalent is possible by simply checking mutual containment. The query containment problem is undecidable for
relational algebra In database theory, relational algebra is a theory that uses algebraic structures for modeling data and defining queries on it with well founded semantics (computer science), semantics. The theory was introduced by Edgar F. Codd. The main applica ...
and
SQL Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel") is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
but is decidable and
NP-complete In computational complexity theory, NP-complete problems are the hardest of the problems to which ''solutions'' can be verified ''quickly''. Somewhat more precisely, a problem is NP-complete when: # It is a decision problem, meaning that for any ...
for conjunctive queries. In fact, it turns out that the query containment problem for conjunctive queries is exactly the same problem as the query evaluation problem. Since queries tend to be small,
NP-completeness In computational complexity theory, NP-complete problems are the hardest of the problems to which ''solutions'' can be verified ''quickly''. Somewhat more precisely, a problem is NP-complete when: # It is a decision problem, meaning that for any ...
here is usually considered acceptable. The query containment problem for conjunctive queries is also equivalent to the
constraint satisfaction problem Constraint satisfaction problems (CSPs) are mathematical questions defined as a set of objects whose state must satisfy a number of constraints or limitations. CSPs represent the entities in a problem as a homogeneous collection of finite const ...
. An important class of conjunctive queries that have polynomial-time combined complexity are the acyclic conjunctive queries. The query evaluation, and thus query containment, is
LOGCFL In computational complexity theory, LOGCFL is the complexity class that contains all decision problems that can be reduced in logarithmic space to a context-free language. This class is closed under complementation. It is situated between NL an ...
-complete and thus in
polynomial time In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations p ...
. Acyclicity of conjunctive queries is a structural property of queries that is defined with respect to the query's
hypergraph In mathematics, a hypergraph is a generalization of a Graph (discrete mathematics), graph in which an graph theory, edge can join any number of vertex (graph theory), vertices. In contrast, in an ordinary graph, an edge connects exactly two vert ...
: a conjunctive query is acyclic if and only if it has hypertree-width 1. For the special case of conjunctive queries in which all relations used are binary, this notion corresponds to the treewidth of the
dependency graph In mathematics, computer science and digital electronics, a dependency graph is a directed graph representing dependencies of several objects towards each other. It is possible to derive an evaluation order or the absence of an evaluation order th ...
of the variables in the query (i.e., the graph having the variables of the query as nodes and an undirected edge \ between two variables if and only if there is an atomic formula R(x,y) or R(y,x) in the query) and the conjunctive query is acyclic if and only if its dependency graph is acyclic. An important generalization of acyclicity is the notion of bounded hypertree-width, which is a measure of how close to acyclic a hypergraph is, analogous to bounded
treewidth In graph theory, the treewidth of an undirected graph is an integer number which specifies, informally, how far the graph is from being a tree. The smallest treewidth is 1; the graphs with treewidth 1 are exactly the trees and the forests ...
in
graphs Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties * Graph (topology), a topological space resembling a graph in the sense of discre ...
. Conjunctive queries of bounded tree-width have
LOGCFL In computational complexity theory, LOGCFL is the complexity class that contains all decision problems that can be reduced in logarithmic space to a context-free language. This class is closed under complementation. It is situated between NL an ...
combined complexity. Unrestricted conjunctive queries over tree data (i.e., a relational database consisting of a binary child relation of a tree as well as unary relations for labeling the tree nodes) have polynomial time combined complexity.
Georg Gottlob Georg Gottlob FRS is an Austrian-Italian computer scientist who works in the areas of database theory, logic, and artificial intelligence and is Professor of Informatics at the University of Calabria. He was Professor at the University of Oxford ...
, Christoph Koch, Klaus U. Schulz: Conjunctive queries over trees. J. ACM 53(2): 238-272 (2006)


References


External links


Ullman, J. D. ''Information integration using logical views Theoretical Computer Science'', 2000, 239, 189-210
{cbignore, bot=medic *
Georg Gottlob Georg Gottlob FRS is an Austrian-Italian computer scientist who works in the areas of database theory, logic, and artificial intelligence and is Professor of Informatics at the University of Calabria. He was Professor at the University of Oxford ...

Presentation on structural decomposition methods for the efficient evaluation of conjunctive queries (PDF)
Database theory