The problem of database repair is a question about
relational database
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
s which has been studied in
database theory
Database theory encapsulates a broad range of topics related to the study and research of the theoretical realm of databases and database management systems.
Theoretical aspects of data management include, among other areas, the foundations of qu ...
, and which is a particular kind of
data cleansing
Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the d ...
. The problem asks about how we can "repair" an input relational database in order to make it satisfy
integrity constraints. The goal of the problem is to be able to work with data that is "dirty", i.e., does not satisfy the right integrity constraints, by reasoning about all possible ''repairs'' of the data, i.e., all possible ways to change the data to make it satisfy the integrity constraints, without committing to a specific choice.
Several variations of the problem exist, depending on:
* what we intend to figure out about the dirty data: figuring out if some database tuple is ''certain'' (i.e., is in every repaired database), figuring out if some
query answer is ''certain'' (i.e., the answer is returned when evaluating the query on every repaired database)
* which kinds of ways are allowed to repair the database: can we insert new facts, remove facts (so-called ''subset repairs''), and so on
* which repaired databases do we study: those where we only change a minimal subset of the database tuples (e.g., ''minimal subset repairs''), those where we only change a minimal number of database tuples (e.g., ''minimal cardinality repairs'')
The problem of database repair has been studied to understand what is the complexity of these different problem variants, i.e., can we efficiently determine information about the state of the repairs, without explicitly materializing all of these repairs.
References
*
See also
*
Probabilistic database
*
Data cleansing
Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the d ...
*
Data integrity
Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire life-cycle and is a critical aspect to the design, implementation, and usage of any system that stores, processes, or retrieves data. The ter ...
Database theory
{{computer science stub