HOME

TheInfoList



OR:

Database scalability is the ability of a
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
to handle changing demands by adding/removing resources.
Databases In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
use a host of techniques to cope.


History

The initial history of database scalability was to provide service on ever smaller computers. The first database management systems such as
IMS Ims is a Norwegian surname. Notable people with the surname include: * Gry Tofte Ims (born 1986), Norwegian footballer * Rolf Anker Ims (born 1958), Norwegian ecologist See also * IMS (disambiguation) Ims is a Norwegian surname. Notable people wit ...
ran on mainframe computers. The second generation, including
Ingres Jean-Auguste-Dominique Ingres ( , ; 29 August 1780 – 14 January 1867) was a French Neoclassical painter. Ingres was profoundly influenced by past artistic traditions and aspired to become the guardian of academic orthodoxy against the a ...
,
Informix IBM Informix is a product family within IBM's Information Management division that is centered on several relational database management system (RDBMS) offerings. The Informix products were originally developed by Informix Corporation, whose ...
,
Sybase Sybase, Inc. was an enterprise software and services company. The company produced software to manage and analyze information in relational databases, with facilities located in California and Massachusetts. Sybase was acquired by SAP in 2010; ...
, RDB and
Oracle An oracle is a person or agency considered to provide wise and insightful counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. As such, it is a form of divination. Description The wor ...
emerged on
minicomputers A minicomputer, or colloquially mini, is a class of smaller general purpose computers that developed in the mid-1960s and sold at a much lower price than mainframe and mid-size computers from IBM and its direct competitors. In a 1970 survey, ...
. The third generation, including
dBase dBase (also stylized dBASE) was one of the first database management systems for microcomputers and the most successful in its day. The dBase system includes the core database engine, a query system, a forms engine, and a programming langua ...
and Oracle (again), ran on personal computers. During the same period, attention turned to handling more data and more demanding workloads. One key software innovation in the late 1980s was to reduce update locking granularity from tables and disk blocks to individual rows. This eliminated a critical scalability bottleneck, as coarser locks could delay access to rows even though they were not directly involved in a transaction. Earlier systems were completely insensitive to increasing resources. Once software limitations had been addressed, attention turned to hardware. Innovation occurred in many areas. The first was to support multiprocessor computers. This involved allowing multiple processors to handle database requests simultaneously, without blocking each other. This evolved into support for multi-core processors. A much more significant change involved allowing distributed transactions to affect data stored on separate computers, using the two-phase commit protocol, establishing the '' shared-nothing architecture''. Still later, Oracle introduced the ''shared-everything architecture'', which provided full functionality on multi-server clusters. Another innovation was storing copies of tables on multiple computers ('' database replication''), which both improved availability (processing could continue on a copy even if the main system was unavailable) and scalability particularly for query/analysis, in that requests could be routed to the copy if the primary reached its capacity. In the early twenty-first century, ''
NoSQL A NoSQL (originally referring to "non- SQL" or "non-relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed ...
'' systems gained favor over relational databases for some workloads. Motivations included still greater scalability and support for documents and other "non-relational" data types. Often sacrificed was the strict ACID consistency protocols that guaranteed perfect consistency at all times in favor of ''
eventual consistency Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last up ...
'' that ensured that all nodes would eventually return the latest data. Some even allowed for transactions to occasionally be lost, as long as the system could handle sufficiently many requests. The most prominent early system was Google's BigTable/ MapReduce, developed in 2004. It achieved near-linear scalability across multiple server farms, at the cost of features such as multi-row transactions and joins. In 2007, the first '' NewSQL'' system, '' H-Store,'' was developed. NewSQL systems attempt to combine NoSQL scalability with ACID transactions and SQL interfaces.


Dimensions

Database
scalability Scalability is the property of a system to handle a growing amount of work by adding resources to the system. In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...
has three basic dimensions: amount of data, volume of requests and size of requests. Requests come in many sizes: transactions generally affect small amounts of data, but may approach thousands per second; analytic queries are generally fewer, but may access more data. A related concept is ''elasticity'', the ability of a system to transparently add and subtract capacity to meet changing workloads.


Vertical

Vertical database scaling implies that the database system can fully exploit maximally configured systems, including typically multiprocessors with large memories and vast storage capacity. Such systems are relatively simple to administer, but may offer reduced availability. However, any single computer has a maximum configuration. If workloads expand beyond that limit, the choices are either to migrate to a different, still larger system, or to rearchitect the system to achieve horizontal scalability.


Horizontal

Horizontal database scaling involves adding more servers to work on a single workload. Most horizontally scalable systems come with functionality compromises. If an application requires more functionality, migration to a vertically scaled system may be preferable.


Techniques


Hardware

Databases run on individual hardware ranging in capacity from smartwatches to supercomputers to multiple transparently reconfigurable server farms. Databases also scaled vertically to run on 64-bit microprocessors,
multi-core A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such ...
CPUs, and large SMP multiprocessors, using multi-threaded implementations.


Contention

Fully exploiting a hardware configuration requires a variety of locking techniques, ranging from locking an entire database to entire tables to disk blocks to individual table rows. The appropriate lock granularity depends on the workload. The smaller the object that is locked, the less the chance of database requests blocking each other, while the hardware idles. Typically row locks are necessary to support high volume transaction processing applications at the cost of processing overhead to manage the larger number of locks. Further, some systems ensure that a query sees a time-consistent view of the database by locking data that a query is examining to prevent an update from modifying it, stalling work. Alternatively, some databases use multi-version read consistency to avoid (blocking) read locks while still providing consistent query results. Another potential bottleneck can occur in some systems when many requests attempt to access the same data at the same time. For example, in OLTP systems, many transactions may attempt to insert data into the same table at the same time. In a shared nothing system, at any given moment, all such inserts are processed by the single server that manages that partition (''shard'') of the table, possibly overwhelming it, while the rest of the system has little to do. Many such tables use a sequence number as their primary key that increases for each new inserted row. The index for that key can also experience contention (overheat) as it processes those inserts. One solution for this is to reverse the digits of the primary key. This spreads the inserts into both the table and the key across multiple parts of the database.


Partitioning

A basic technique is to
split Split(s) or The Split may refer to: Places * Split, Croatia, the largest coastal city in Croatia * Split Island, Canada, an island in the Hudson Bay * Split Island, Falkland Islands * Split Island, Fiji, better known as Hạfliua Arts, entertain ...
large tables into multiple partitions based on ranges of values in a key field. For example, the data for each year could be held on a separate disk drive or on a separate computer. Partitioning removes limits on the sizes of a single table.


Replication

Replicated databases maintain copies of tables or databases on multiple computers. This scaling technique is particularly convenient for seldom or never-updated data, such as transaction history or tax tables.


Clustered computers

A variety of approaches are used to scale beyond the limits of a single computer. HP Enterprise's NonStop SQL uses the ''shared nothing'' architecture in which neither data nor memory are shared across server boundaries. A coordinator routes database requests to the correct server. This architecture provides near-linear scalability. The widely supported X/Open XA standard employs a global transaction monitor to coordinate distributed transactions among semi-autonomous XA-compliant transaction resources. Oracle RAC uses a different model to achieve scalability, based on a "shared-everything" architecture. This approach incorporates the shared disk approach that allows multiple computers to access any disk in the cluster. Network-attached storage (NAS) and Storage area networks (SANs) coupled with local area networks and
Fibre Channel Fibre Channel (FC) is a high-speed data transfer protocol providing in-order, lossless delivery of raw block data. Fibre Channel is primarily used to connect computer data storage to servers in storage area networks (SAN) in commercial data c ...
technology enable such configurations. The approach includes a "shared" logical cache in which data that has been cached in memory on server is made available to other servers without requiring them to again read the data from disk. Each page is moved from server to server to satisfy requests. Updates generally happen very quickly so that a "popular" page can be updated by multiple transactions with little delay. This approach is claimed to support clusters containing up to 100 servers. Some researchers question the inherent limitations of
relational database management systems A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
. GigaSpaces, for example, contends that space-based architecture is required to achieve performance and scalability.
Base One Base One International Corp. was an American company that specialized in developing software for constructing database applications and distributed computing systems. Headquartered in New York City, the company was founded in 1993 and expanded i ...
makes the case for extreme scalability within mainstream relational database technology.{{cite web, url=http://www.boic.com/scalability.htm, title=Database Scalability - Dispelling myths about the limits of database-centric architecture, author=Base One, year=2007, accessdate=May 23, 2007


See also

*
Relational database A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
*
Scalability Scalability is the property of a system to handle a growing amount of work by adding resources to the system. In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...


References


External links


Database management systems