
A partition is a division of a logical
database
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
or its constituent elements into distinct independent parts. Database partitioning is normally done for manageability,
performance
A performance is an act of staging or presenting a play, concert, or other form of entertainment. It is also defined as the action or process of carrying out or accomplishing an action, task, or function.
Management science
In the work place ...
or
availability
In reliability engineering, the term availability has the following meanings:
* The degree to which a system, subsystem or equipment is in a specified operable and committable state at the start of a mission, when the mission is called for at ...
reasons, or for
load balancing. It is popular in
distributed database management system
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
s, where each partition may be spread over multiple nodes, with users at the node performing local transactions on the partition. This increases performance for sites that have regular transactions involving certain views of data, whilst maintaining availability and security.
Partitioning criteria
Current high-end
relational database management system
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relati ...
s provide for different criteria to split the database. They take a ''partitioning key'' and assign a partition based on certain criteria. Some common criteria include:
* Range partitioning: selects a partition by determining if the partitioning key is within a certain range. An example could be a partition for all
rows where the "zipcode"
column
A column or pillar in architecture and structural engineering is a structural element that transmits, through compression, the weight of the structure above to other structural elements below. In other words, a column is a compression member ...
has a value between 70000 and 79999. It distributes tuples based on the value intervals (ranges) of some attribute. In addition to supporting exact-match queries (as in hashing), it is well-suited for range queries. For instance, a query with a predicate “A between A1 and A2” may be processed by the only node(s) containing tuples.
* List partitioning: a partition is assigned a list of values. If the partitioning key has one of these values, the partition is chosen. For example, all rows where the column
Country
is either
Iceland
,
Norway
,
Sweden
,
Finland
or
Denmark
could build a partition for the
Nordic countries.
* Composite partitioning: allows for certain combinations of the above partitioning schemes, by for example first applying a range partitioning and then a hash partitioning.
Consistent hashing
In computer science, consistent hashing is a special kind of hashing technique such that when a hash table is resized, only n/m keys need to be remapped on average where n is the number of keys and m is the number of slots. In contrast, in most tra ...
could be considered a composite of hash and list partitioning where the hash reduces the key space to a size that can be listed.
*Round-robin partitioning: the simplest strategy, it ensures uniform data distribution. With
n
partitions, the
i
th tuple in insertion order is assigned to partition
(i mod n)
. This strategy enables the sequential access to a relation to be done in parallel. However, the direct access to individual tuples, based on a predicate, requires accessing the entire relation.
* Hash partitioning: applies a
hash function
A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually ...
to some attribute that yields the partition number. This strategy allows exact-match queries on the selection attribute to be processed by exactly one node and all other queries to be processed by all the nodes in parallel.
Partitioning methods
The partitioning can be done by either building separate smaller databases (each with its own
tables
Table may refer to:
* Table (furniture), a piece of furniture with a flat surface and one or more legs
* Table (landform), a flat area of land
* Table (information), a data arrangement with rows and columns
* Table (database), how the table data ...
,
indices, and
transaction
Transaction or transactional may refer to:
Commerce
*Financial transaction, an agreement, communication, or movement carried out between a buyer and a seller to exchange an asset for payment
*Debits and credits in a Double-entry bookkeeping syst ...
logs), or by splitting selected elements, for example just one table.
* Horizontal partitioning involves putting different rows into different tables. For example, customers with
ZIP codes less than 50000 are stored in CustomersEast, while customers with ZIP codes greater than or equal to 50000 are stored in CustomersWest. The two partition tables are then CustomersEast and CustomersWest, while a
view
A view is a sight or prospect or the ability to see or be seen from a particular place.
View, views or Views may also refer to:
Common meanings
* View (Buddhism), a charged interpretation of experience which intensely shapes and affects thou ...
with a
union
Union commonly refers to:
* Trade union, an organization of workers
* Union (set theory), in mathematics, a fundamental operation on sets
Union may also refer to:
Arts and entertainment
Music
* Union (band), an American rock group
** ''Un ...
might be created over both of them to provide a complete view of all customers.
* Vertical partitioning involves creating tables with fewer columns and using additional tables to store the remaining columns.
["Vertical Partitioning Algorithms for Database Design"](_blank)
by Shamkant Navathe, Stefano Ceri, Gio Wiederhold, and Jinglie Dou, Stanford University 1984 Generally, this practice is known as
normalization
Normalization or normalisation refers to a process that makes something more normal or regular. Most commonly it refers to:
* Normalization (sociology) or social normalization, the process through which ideas and behaviors that may fall outside of ...
. However, vertical partitioning extends further, and partitions columns even when already normalized. This type of partitioning is also called "row splitting", since rows get split by their columns, and might be performed explicitly or implicitly. Distinct physical machines might be used to realize vertical partitioning: storing infrequently used or very wide columns, taking up a significant amount of memory, on a different machine, for example, is a method of vertical partitioning. A common form of vertical partitioning is to split static data from dynamic data, since the former is faster to access than the latter, particularly for a table where the dynamic data is not used as often as the static. Creating a view across the two newly created tables restores the original table with a performance penalty, but accessing the static data alone will show higher performance. A
columnar database
A column-oriented DBMS or columnar DBMS is a database management system (DBMS) that stores data tables by column rather than by row. Benefits include more efficient access to data when only querying a subset of columns (by eliminating the need to r ...
can be regarded as a database that has been vertically partitioned until each column is stored in its own table.
See also
*
Block Range Index A Block Range Index or BRIN is a database indexing technique. They are intended to improve performance with extremely large tables.
BRIN indexes provide similar benefits to horizontal partitioning or sharding but without needing to explicitly decl ...
*
CAP theorem
In theoretical computer science, the CAP theorem, also named Brewer's theorem after computer scientist Eric Brewer, states that any distributed data store can provide only two of the following three guarantees:Seth Gilbert and Nancy Lynch"Brewer ...
*
Data striping
In computer data storage, data striping is the technique of segmenting logically sequential data, such as a file, so that consecutive segments are stored on different physical storage devices.
Striping is useful when a processing device reques ...
in RAIDs
*
Shard (database architecture)
A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard is held on a separate database server instance, to spread load.
Some data within a database remains present in all shards, but so ...
References
{{DEFAULTSORT:Partition (Database)
Database management systems