Most real databases contain data whose correctness is uncertain. In order to work with such data, there is a need to quantify the integrity of the data. This is achieved by using probabilistic databases. A probabilistic database is an uncertain database in which the

possible worlds Possible Worlds may refer to: * Possible worlds, concept in philosophy * ''Possible Worlds'' (play), 1990 play by John Mighton ** ''Possible Worlds'' (film), 2000 film by Robert Lepage, based on the play * Possible Worlds (studio) * ''Possible ...

have associated

probabilities Probability is a branch of mathematics and statistics concerning Event (probability theory), events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probab ...

. Probabilistic

database management system In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and an ...

s are currently an active area of research. "While there are currently no commercial probabilistic database systems, several research prototypes exist..." Probabilistic databases distinguish between the

logical data model A logical data model or logical schema is a data model of a specific problem domain expressed independently of a particular database management product or storage technology (physical data model) but in terms of data structures such as relational ta ...

and the physical representation of the data much like

relational database A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...

s do in the ANSI-SPARC Architecture. In probabilistic databases this is even more crucial since such databases have to represent very large numbers of possible worlds, often exponential in the size of one world (a classical

database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...

), succinctly.

Terminology

In a probabilistic database, each tuple is associated with a probability between 0 and 1, with 0 representing that the data is certainly incorrect, and 1 representing that it is certainly correct.

Possible worlds

A probabilistic database could exist in multiple states. For example, if there is uncertainty about the existence of a tuple in the database, then the database could be in two different states with respect to that tuple—the first state contains the tuple, while the second one does not. Similarly, if an attribute can take one of the values ''x'', ''y'' or ''z'', then the database can be in three different states with respect to that attribute. Each of these ''states'' is called a possible world. Consider the following database: (Here ' denotes that the attribute can take any of the values ''b3'', ''b3′'' or ''b3′′'') *Assuming that there is uncertainty about the first tuple, certainty about the second tuple, and uncertainty about the value of attribute B in the third tuple. Then the actual state of the database may or may not contain the first tuple (depending on whether it is correct or not). Similarly, the value of the attribute B may be ''b3'', ''b3′'' or ''b3′′''. Consequently, the possible worlds corresponding to the database are as follows:

Types of Uncertainties

There are essentially two kinds of uncertainties that could exist in a probabilistic database, as described in the table below: By assigning values to random variables associated with the data items, different possible worlds can be represented.

History

The first published use of the term "probabilistic database" was probably in the 1987 VLDB conference paper "The theory of probabilistic databases", by Cavallo and Pittarelli.Roger Cavallo, Michael Pittarelli: The Theory of Probabilistic Databases. In VLDB'87, Proceedings of 13th International Conference on Very Large Data Bases, September 1–4, 1987, Brighton: 71–81 (1987) The title (of the 11 page paper) was intended as a bit of a joke, since David Maier's 600 page monograph, The Theory of Relational Databases, would have been familiar at that time to many of the conference participants and readers of the conference proceedings.

References

External links

* The MayBMS project at

Cornell University Cornell University is a Private university, private Ivy League research university based in Ithaca, New York, United States. The university was co-founded by American philanthropist Ezra Cornell and historian and educator Andrew Dickson W ...

sourceforge.net project site
* Th

project at the

University of Washington The University of Washington (UW and informally U-Dub or U Dub) is a public research university in Seattle, Washington, United States. Founded in 1861, the University of Washington is one of the oldest universities on the West Coast of the Uni ...

* Th
Orion
project at

Purdue University Purdue University is a Public university#United States, public Land-grant university, land-grant research university in West Lafayette, Indiana, United States, and the flagship campus of the Purdue University system. The university was founded ...

* Th
Trio
project at

Stanford University Leland Stanford Junior University, commonly referred to as Stanford University, is a Private university, private research university in Stanford, California, United States. It was founded in 1885 by railroad magnate Leland Stanford (the eighth ...

* Th
BayesStore
project at the

University of California, Berkeley The University of California, Berkeley (UC Berkeley, Berkeley, Cal, or California), is a Public university, public Land-grant university, land-grant research university in Berkeley, California, United States. Founded in 1868 and named after t ...

* Th
PrDB
project at the

University of Maryland, College Park The University of Maryland, College Park (University of Maryland, UMD, or simply Maryland) is a public university, public Land-grant university, land-grant research university in College Park, Maryland, United States. Founded in 1856, UMD i ...

* Th
Mimir
project at the

University at Buffalo The State University of New York at Buffalo (commonly referred to as UB, University at Buffalo, and sometimes SUNY Buffalo) is a public university, public research university in Buffalo, New York, Buffalo and Amherst, New York, United States. ...

* Th
ProvSQL
project at

École normale supérieure (Paris) The – PSL (; also known as ENS, , Ulm or ENS Paris) is a ''grande école'' in Paris, France. It is one of the constituent members of Paris Sciences et Lettres University (PSL). Due to its selectivity, historical role, and influence within F ...

(Module for

PostgreSQL PostgreSQL ( ) also known as Postgres, is a free and open-source software, free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transaction processing, transactions ...

) {{databases Database management systems Types of databases Database theory Fuzzy logic