Amazon DynamoDB
   HOME

TheInfoList



OR:

Amazon DynamoDB is a fully managed proprietary NoSQL
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...
service that supports key–value and document data structures and is offered by
Amazon.com Amazon.com, Inc. ( ) is an American multinational technology company focusing on e-commerce, cloud computing, online advertising, digital streaming, and artificial intelligence. It has been referred to as "one of the most influential econo ...
as part of the
Amazon Web Services Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon that provides on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered pay-as-you-go basis. These cloud computing web services provide d ...
portfolio. DynamoDB exposes a similar data model to and derives its name from
Dynamo "Dynamo Electric Machine" (end view, partly section, ) A dynamo is an electrical generator that creates direct current using a commutator. Dynamos were the first electrical generators capable of delivering power for industry, and the foundati ...
, but has a different underlying implementation. Dynamo had a multi-leader design requiring the client to resolve version conflicts and DynamoDB uses synchronous replication across multiple data centers for high durability and availability. DynamoDB was announced by Amazon CTO
Werner Vogels Werner Hans Peter Vogels (born 3 October 1958) is the chief technology officer and vice president of Amazon in charge of driving technology innovation within the company. Vogels has broad internal and external responsibilities. Early life and ...
on January 18, 2012, and is presented as an evolution of Amazon SimpleDB.


Background

Werner Vogels Werner Hans Peter Vogels (born 3 October 1958) is the chief technology officer and vice president of Amazon in charge of driving technology innovation within the company. Vogels has broad internal and external responsibilities. Early life and ...
, CTO at Amazon.com, provided a motivation for the project in his 2012 announcement. Amazon began as a decentralized network of services. Originally, services had direct access to each other's databases. When this became a bottleneck on engineering operations, services moved away from this direct access pattern in favor of public-facing APIs. Still, third-party
relational database management systems A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
struggled to handle Amazon's client base. This culminated during the 2004 holiday season, when several technologies failed under high traffic. Engineers were normalizing these relational systems to reduce data redundancy, a design that optimizes for storage. The sacrifice: they stored a given "item" of data (e.g., the information pertaining to a product in a product database) over several relations, and it takes time to assemble disjoint parts for a query. Many of Amazon's services demanded mostly primary-key reads on their data, and with speed a top priority, putting these pieces together was extremely taxing. Content with compromising storage efficiency, Amazon's response was
Dynamo "Dynamo Electric Machine" (end view, partly section, ) A dynamo is an electrical generator that creates direct current using a commutator. Dynamos were the first electrical generators capable of delivering power for industry, and the foundati ...
: a highly available key–value store built for internal use. Dynamo, it seemed, was everything their engineers needed, but adoption lagged. Amazon's developers opted for "just works" design patterns with S3 and SimpleDB. While these systems had noticeable design flaws, they did not demand the overhead of provisioning hardware and scaling and re-partitioning data. Amazon's next iteration of NoSQL technology, DynamoDB, automated these database management operations.


Overview

DynamoDB differs from other Amazon services by allowing developers to purchase a service based on
throughput Network throughput (or just throughput, when in context) refers to the rate of message delivery over a communication channel, such as Ethernet or packet radio, in a communication network. The data that these messages contain may be delivered ove ...
, rather than storage. If Auto Scaling is enabled, then the database will scale automatically. Additionally, administrators can request throughput changes and DynamoDB will spread the data and traffic over a number of servers using
solid-state drive A solid-state drive (SSD) is a solid-state storage device that uses integrated circuit assemblies to store data persistently, typically using flash memory, and functioning as secondary storage in the hierarchy of computer storage. It is a ...
s, allowing predictable performance. It offers integration with
Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage an ...
via Elastic MapReduce. In September 2013, Amazon made a local development version of DynamoDB available so developers could test DynamoDB-backed applications locally.


Development considerations


Data modeling

A DynamoDB
table Table may refer to: * Table (furniture), a piece of furniture with a flat surface and one or more legs * Table (landform), a flat area of land * Table (information), a data arrangement with rows and columns * Table (database), how the table data ...
features items that have attributes, some of which form a primary key. In relational systems, however, an item features each table attribute (or juggles "null" and "unknown" values in their absence), DynamoDB items are schema-less. The only exception: when creating a table, a developer specifies a primary key, and the table requires a key for every item. Primary keys must be scalar ( strings, numbers, or
binary Binary may refer to: Science and technology Mathematics * Binary number, a representation of numbers using only two digits (0 and 1) * Binary function, a function that takes two arguments * Binary operation, a mathematical operation that ta ...
) and can take one of two forms. A single-attribute primary key is known as the table's "partition key", which determines the partition that an item hashes to––more on partitioning below––so an ideal partition key has a uniform distribution over its range. A primary key can also feature a second attribute, which DynamoDB calls the table's "sort key". In this case, partition keys do not have to be unique; they are paired with sort keys to make a unique identifier for each item. The partition key is still used to determine which partition the item is stored in, but within each partition, items are sorted by the sort key.


Indices

In the relational model, indices typically serve as "helper" data structures to supplement a table. They allow the DBMS to optimize queries under the hood and they do not improve query functionality. In DynamoDB, there is no query optimizer, and an index is simply another table with a different key (or two) that sits beside the original. When a developer creates an index, they create a new copy of their data, but only the fields that they specified get copied over (at a minimum, the fields that they index on and the original table's primary key). DynamoDB users issue queries directly to their indices. There are two types of indices available. A global secondary index features a partition key (and optional sort key) that's different from the original table's partition key. A local secondary index features the same partition key as the original table, but a different sort key. Both indices introduce entirely new query functionality to a DynamoDB database by allowing queries on new keys. Similar to relational database management systems, DynamoDB updates indices automatically on addition/update/deletion, so you must be judicious when creating them or risk slowing down a write-heavy database with a slew of index updates.


Syntax

DynamoDB uses JSON for its syntax because of its ubiquity. The create table action demands just three arguments: TableName, KeySchema––a list containing a partition key and an optional sort key––and AttributeDefinitions––a list of attributes to be defined which must at least contain definitions for the attributes used as partition and sort keys. Whereas relational databases offer robust query languages, DynamoDB offers just Put, Get, Update, and Delete operations. Put requests contain the TableName attribute and an Item attribute, which consists of all the attributes and values the item has. An Update request follows the same syntax. Similarly, to get or delete an item, simply specify a TableName and Key.


System architecture


Data structures

DynamoDB uses
hashing Hash, hashes, hash mark, or hashing may refer to: Substances * Hash (food), a coarse mixture of ingredients * Hash, a nickname for hashish, a cannabis product Hash mark * Hash mark (sports), a marking on hockey rinks and gridiron football fiel ...
and
B-tree In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for ...
s to manage data. Upon entry, data is first distributed into different partitions by hashing on the partition key. Each partition can store up to 10GB of data and handle by default 1,000 write capacity units (WCU) and 3,000 read capacity units (RCU). One RCU represents one strongly consistent read per second or two eventually consistent reads per second for items up to 4KB in size. One WCU represents one write per second for an item up to 1KB in size. To prevent data loss, DynamoDB features a two-tier backup system of replication and long-term storage. Each partition features three nodes, each of which contains a copy of that partition's data. Each node also contains two data structures: a B tree used to locate items, and a replication log that notes all changes made to the node. DynamoDB periodically takes snapshots of these two data structures and stores them for a month in S3 so that engineers can perform point-in-time restores of their databases. Within each partition, one of the three nodes is designated the "leader node". All write operations travel first through the leader node before propagating, which makes writes consistent in DynamoDB. To maintain its status, the leader sends a "heartbeat" to each other node every 1.5 seconds. Should another node stop receiving heartbeats, it can initiate a new leader election. DynamoDB uses the
Paxos algorithm Paxos ( gr, Παξός) is a Greek island in the Ionian Sea, lying just south of Corfu. As a group with the nearby island of Antipaxos and adjoining islets, it is also called by the plural form Paxi or Paxoi ( gr, Παξοί, pronounced in Engl ...
to elect leaders. Amazon engineers originally avoided Dynamo due to engineering overheads like provisioning and managing partitions and nodes. In response, the DynamoDB team built a service it calls AutoAdmin to manage a database. AutoAdmin replaces a node when it stops responding by copying data from another node. When a partition exceeds any of its three thresholds (RCU, WCU, or 10GB), AutoAdmin will automatically add additional partitions to further segment the data. Just like indexing systems in the relational model, DynamoDB demands that any updates to a table be reflected in each of the table's indices. DynamoDB handles this using a service it calls the "log propagator", which subscribes to the replication logs in each node and sends additional Put, Update, and Delete requests to indices as necessary. Because indices result in substantial performance hits for write requests, DynamoDB allows a user at most five of them on any given table.


Query execution

Suppose that a DynamoDB user issues a write operation (a Put, Update, or Delete). While a typical relational system would convert the SQL query to
relational algebra In database theory, relational algebra is a theory that uses algebraic structures with a well-founded semantics for modeling data, and defining queries on it. The theory was introduced by Edgar F. Codd. The main application of relational algebr ...
and run optimization algorithms, DynamoDB skips both processes and gets right to work. The request arrives at the DynamoDB request router, which authenticates––"Is the request coming from where/whom it claims to be?"––and checks for authorization––"Does the user submitting the request have the requisite permissions?" Assuming these checks pass, the system hashes the request's partition key to arrive in the appropriate partition. There are three nodes within, each with a copy of the partition's data. The system first writes to the leader node, then writes to a second node, then sends a "success" message, and finally continues propagating to the third node. Writes are consistent because they always travel first through the leader node. Finally, the log propagator propagates the change to all indices. For each index, it grabs that index's primary key value from the item, then performs the same write on that index without log propagation. If the operation is an Update to a preexisting item, the updated attribute may serve as a primary key for an index, and thus the B tree for that index must update as well. B trees only handle insert, delete, and read operations, so in practice, when the log propagator receives an Update operation, it issues both a Delete operation and a Put operation to all indices. Now suppose that a DynamoDB user issues a Get operation. The request router proceeds as before with authentication and authorization. Next, as above, we hash our partition key to arrive in the appropriate hash. Now, we encounter a problem: with three nodes in eventual consistency with one another, how can we decide which to investigate? DynamoDB offers the user two options when issuing a read: consistent and eventually consistent. A consistent read visits the leader node. But the consistency-availability trade-off rears its head again here: in read-heavy systems, always reading from the leader can overwhelm a single node and reduce availability. The second option, an eventually consistent read, selects a random node. In practice, this is where DynamoDB trades consistency for availability. If we take this route, what are the odds of an inconsistency? We'd need a write operation to return "success" and begin propagating to the third node, but not finish. We'd also need our Get to target this third node. This means a 1-in-3 chance of inconsistency within the write operation's propagation window. How long is this window? Any number of catastrophes could cause a node to fall behind, but in the vast majority of cases, the third node is up-to-date within milliseconds of the leader.


Performance

DynamoDB exposes performance metrics that help users provision it correctly and keep applications using DynamoDB running smoothly: * Requests and throttling
Errors
ProvisionedThroughputExceededException,ConditionalCheckFailedException,Internal Server Error(HTTP 500) * Metrics related t

creation These metrics can be tracked using the AWS Management Console, using the AWS Command Line Interface, or a monitoring tool integrating with Amazon CloudWatch.


Language bindings

Languages and frameworks with a DynamoDB binding include
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
,
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of websites use JavaScript on the client side for webpage behavior, of ...
, Node.js, Go, C# .NET,
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
, PHP,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
,
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called ...
,
Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO( ...
,
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
, Erlang, Django, and Grails.


Code examples


HTTP API

Agains
HTTP API
query items: POST / HTTP/1.1 Host: dynamodb..; Accept-Encoding: identity Content-Length: User-Agent: Content-Type: application/x-amz-json-1.0 Authorization: AWS4-HMAC-SHA256 Credential=, SignedHeaders=, Signature= X-Amz-Date: X-Amz-Target: DynamoDB_20120810.Query Sample response: HTTP/1.1 200 OK x-amzn-RequestId: x-amz-crc32: Content-Type: application/x-amz-json-1.0 Content-Length: Date:


Go

GetItem in Go: getItemInput := &dynamodb.GetItemInput getItemOutput, err := dynamodbClient.GetItem(getItemInput) DeleteItem in Go: deleteItemInput := &dynamodb.DeleteItemInput _, err := dynamodbClient.DeleteItem(deleteItemInput) if err != nil UpdateItem
in Go usin
Expression Builder
update := expression.Set( expression.Name(name), expression.Value(value), ) expr, err := expression.NewBuilder().WithUpdate(update).Build() if err != nil updateItemInput := &dynamodb.UpdateItemInput fmt.Printf("updateItemInput: %#v\n", updateItemInput) _, err = dynamodbClient.UpdateItem(updateItemInput) if err != nil


See also

*
Amazon Aurora Amazon Aurora is a relational database service developed and offered by Amazon Web Services beginning in October 2014. Aurora is available as part of the Amazon Relational Database Service (RDS). History Aurora offered MySQL compatible servic ...
* Amazon DocumentDB (with MongoDB compatibility) *
Amazon Redshift Amazon Redshift is a data warehouse product which forms part of the larger cloud-computing platform Amazon Web Services. It is built on top of technology from the massive parallel processing (MPP) data warehouse company ParAccel (later acquire ...
* Amazon Relational Database Service


References


External links

*
Video:_AWS_re:Invent_2019:_[REPEAT_1
Amazon_DynamoDB_deep_dive:_Advanced_design_patterns_(DAT403-R1).html" ;"title="EPEAT 1">Video: AWS re:Invent 2019: EPEAT_1">Video:_AWS_re:Invent_2019:_[REPEAT_1
Amazon_DynamoDB_deep_dive:_Advanced_design_patterns_(DAT403-R1)
{{Cloud_computing Amazon_Web_Services.html" ;"title="EPEAT 1
Amazon DynamoDB deep dive: Advanced design patterns (DAT403-R1)">EPEAT 1">Video: AWS re:Invent 2019: [REPEAT 1
Amazon DynamoDB deep dive: Advanced design patterns (DAT403-R1)
{{Cloud computing Amazon Web Services">DynamoDB Cloud storage Distributed data stores Structured storage NoSQL products Cloud databases Computer-related introductions in 2012 Key-value databases