A parallel
database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
system seeks to improve performance through
parallelization
Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different for ...
of various operations, such as loading data, building indexes and evaluating queries. Although data may be stored in a distributed fashion, the distribution is governed solely by performance considerations. Parallel databases improve processing and
input/output
In computing, input/output (I/O, i/o, or informally io or IO) is the communication between an information processing system, such as a computer, and the outside world, such as another computer system, peripherals, or a human operator. Inputs a ...
speeds by using multiple
CPUs and disks in parallel. Centralized and
client–server database systems are not powerful enough to handle such applications. In parallel processing, many operations are performed simultaneously, as opposed to serial processing, in which the computational steps are performed sequentially. Parallel databases can be roughly divided into two groups, the first group of architecture is the multiprocessor architecture, the alternatives of which are the following:
;
Shared-memory architecture
A shared-memory architecture (SM) is a distributed computing Software architecture, architecture in which the nodes share the same memory as well as the same storage.{{Cite web , title=Memory: Shared vs Distributed - UFRC , url=https://help.rc.ufl. ...
: Where multiple
processors share the
main memory (RAM) space but each processor has its own disk (HDD). If many processes run simultaneously, the speed is reduced, the same as a computer when many parallel tasks run and the computer slows down.
;
Shared-disk architecture: Where each node has its own main memory, but all nodes share mass storage, usually a
storage area network
A storage area network (SAN) or storage network is a computer network which provides access to consolidated, block device, block-level data storage. SANs are primarily used to access Computer data storage, data storage devices, such as disk ...
. In practice, each node usually also has multiple processors.
;
Shared-nothing architecture: Where each node has its own mass storage as well as main memory.
The other architecture group is called hybrid architecture, which includes:
*Non-Uniform Memory Architecture (NUMA), which involves the
non-uniform memory access.
*Cluster (shared nothing + shared disk: SAN/NAS), which is formed by a group of connected computers.
in this switches or hubs are used to connect different computers its most cheapest way and simplest way only simple topologies are used to connect different computers . much smarter if
switches are implemented.
Types of parallelism
;Intraquery parallelism:A single query that is executed in parallel using multiple processors or disks.
;Independent parallelism: Execution of each operation individually in different processors only if they can be executed independent of each other. For example, if we need to join four tables, then two can be joined at one processor and the other two can be joined at another processor. Final join can be done later.
;Pipe-lined parallelism: Execution of different operations in pipe-lined fashion. For example, if we need to join three tables, one processor may join two tables and send the result set records as and when they are produced to the other processor. In the other processor the third table can be joined with the incoming records and the final result can be produced.
;Intraoperation parallelism: Execution of single complex or large operations in parallel in multiple processors. For example, ORDER BY clause of a query that tries to execute on millions of records can be parallelized on multiple processors.
References
Types of databases
{{database-stub