HOME

TheInfoList



OR:

A distributed database is a
database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
in which data is stored across different physical locations. It may be stored in multiple
computers A computer is a machine that can be programmed to automatically carry out sequences of arithmetic or logical operations ('' computation''). Modern digital electronic computers can perform generic sets of operations known as ''programs'', ...
located in the same physical location (e.g. a data centre); or maybe dispersed over a network of interconnected computers. Unlike
parallel systems Parallel may refer to: Mathematics * Parallel (geometry), two lines in the Euclidean plane which never intersect * Parallel (operator), mathematical operation named after the composition of electrical resistance in parallel circuits Science ...
, in which the processors are tightly coupled and constitute a single database system, a distributed database system consists of loosely coupled sites that share no physical components. System administrators can distribute collections of data (e.g. in a database) across multiple physical locations. A distributed database can reside on organised network servers or decentralised independent computers on the
Internet The Internet (or internet) is the Global network, global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a internetworking, network of networks ...
, on corporate intranets or extranets, or on other organisation
networks Network, networking and networked may refer to: Science and technology * Network theory, the study of graphs as a representation of relations between discrete objects * Network science, an academic field that studies complex networks Mathematics ...
. Because distributed databases store data across multiple computers, distributed databases may improve performance at
end-user In product development, an end user (sometimes end-user) is a person who ultimately uses or is intended to ultimately use a product. The end user stands in contrast to users who support or maintain the product, such as sysops, system administrato ...
worksites by allowing transactions to be processed on many machines, instead of being limited to one. O'Brien, J. & Marakas, G.M.(2008) Management Information Systems (pp. 185-189). New York, NY: McGraw-Hill Irwin Two processes ensure that the distributed databases remain up-to-date and current: replication and duplication. # Replication involves using specialized software that looks for changes in the distributive database. Once the changes have been identified, the replication process makes all the databases look the same. The replication process can be complex and time-consuming, depending on the size and number of the distributed databases. This process can also require much time and computer resources. # Duplication, on the other hand, has less complexity. It identifies one database as a master and then duplicates that database. The duplication process is normally done at a set time after hours. This is to ensure that each distributed location has the same data. In the duplication process, users may change only the master database. This ensures that local data will not be overwritten. Both replication and duplication can keep the data current in all distributive locations. Besides distributed database replication and fragmentation, there are many other distributed database design technologies. For example, local autonomy, synchronous, and asynchronous distributed database technologies. The implementation of these technologies can and do depend on the needs of the business and the sensitivity/
confidentiality Confidentiality involves a set of rules or a promise sometimes executed through confidentiality agreements that limits the access to or places restrictions on the distribution of certain types of information. Legal confidentiality By law, la ...
of the data stored in the database and the price the business is willing to spend on ensuring
data security Data security or data protection means protecting digital data, such as those in a database, from destructive forces and from the unwanted actions of unauthorized users, such as a cyberattack or a data breach. Technologies Disk encryption ...
,
consistency In deductive logic, a consistent theory is one that does not lead to a logical contradiction. A theory T is consistent if there is no formula \varphi such that both \varphi and its negation \lnot\varphi are elements of the set of consequences ...
and
integrity Integrity is the quality of being honest and having a consistent and uncompromising adherence to strong moral and ethical principles and values. In ethics, integrity is regarded as the honesty and Honesty, truthfulness or of one's actions. Integr ...
. When discussing access to distributed databases,
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
favors the term distributed query, which it defines in protocol-specific manner as " y SELECT, INSERT, UPDATE, or DELETE statement that references tables and rowsets from one or more external OLE DB data sources".
Oracle An oracle is a person or thing considered to provide insight, wise counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. If done through occultic means, it is a form of divination. Descript ...
provides a more language-centric view in which distributed queries and distributed transactions form part of distributed SQL.


Architecture

There are 3 main architecture types for distributed databases: * Shared-memory: very rarely used * Shared-disk * Shared-nothing In the shared-memory and shared-disk architectures, the data is not partitioned, but it has to be in a shared-nothing architecture. Shared-disk architecture is more common for cloud databases than for on-premise. Historically, shared-nothing was the first architecture to be implemented on the cloud, before the advent of shared cloud storage made shared-disk possible. In practice, different layers of the database can have different architectures. It is now common to have a compute layer with a shared nothing architecture, and a storage layer with a shared disk architecture. This is for instance the case of
Snowflake A snowflake is a single ice crystal that is large enough to fall through the Earth's atmosphere as snow.Knight, C.; Knight, N. (1973). Snow crystals. Scientific American, vol. 228, no. 1, pp. 100–107.Hobbs, P.V. 1974. Ice Physics. Oxford: C ...
and AWS Aurora.


List of shared-nothing databases

* IBM Db2 * Greenplum * Netezza *
Teradata Teradata Corporation is an American software company that provides cloud database and Analytics, analytics-related software, products, and services. The company was formed in 1979 in Brentwood, California, as a collaboration between researchers a ...
* TiDB * Vertica


List of shared-disk databases

* AWS Aurora * Neon *
Snowflake A snowflake is a single ice crystal that is large enough to fall through the Earth's atmosphere as snow.Knight, C.; Knight, N. (1973). Snow crystals. Scientific American, vol. 228, no. 1, pp. 100–107.Hobbs, P.V. 1974. Ice Physics. Oxford: C ...


See also

*
Centralized database A centralized database (sometimes abbreviated CDB) is a database that is located, stored, and maintained in a single location. This location is most often a central computer or database system, for example a desktop or server CPU, or a mainframe co ...
*
Data grid A data grid is an architecture or set of services that allows users to access, modify and transfer extremely large amounts of geographically distributed data for research purposes. Data grids make this possible through a host of middleware Applic ...
* Distributed cache * Distributed data store *
Distributed hash table A distributed hash table (DHT) is a Distributed computing, distributed system that provides a lookup service similar to a hash table. Key–value pairs are stored in a DHT, and any participating node (networking), node can efficiently retrieve the ...
*
Routing protocol A routing protocol specifies how routers communicate with each other to distribute information that enables them to select paths between nodes on a computer network. Routers perform the traffic directing functions on the Internet; data packet ...
* Distributed SQL


References


Further reading

*M. T. Özsu and P. Valduriez, ''Principles of Distributed Databases'' (3rd edition) (2011), Springer, *Elmasri and Navathe, ''Fundamentals of database systems'' (3rd edition), Addison-Wesley Longman, *''Oracle Database Administrator's Guide 10g'' (Release 1), http://docs.oracle.com/cd/B14117_01/server.101/b10739/ds_concepts.htm {{Authority control Data management Types of databases Distributed computing architecture Applications of distributed computing Database management systems