HOME

TheInfoList



OR:

Apache Accumulo is a highly scalable sorted, distributed key-value store based on
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
's Bigtable. It is a system built on top of
Apache Hadoop Apache Hadoop () is a collection of open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop wa ...
, Apache ZooKeeper, and
Apache Thrift Thrift is an IDL (interface definition language, Interface Definition Language) and Binary protocol, binary communication protocol used for defining and creating service (systems architecture), services for programming languages. It was developed ...
. Written in
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
, Accumulo has cell-level access labels and server-side programming mechanisms. According to DB-Engines ranking, Accumulo is the third most popular
NoSQL NoSQL (originally meaning "Not only SQL" or "non-relational") refers to a type of database design that stores and retrieves data differently from the traditional table-based structure of relational databases. Unlike relational databases, which ...
wide column store behind
Apache Cassandra Apache Cassandra is a free and open-source software, free and open-source database management system designed to handle large volumes of data across multiple Commodity computing, commodity servers. The system prioritizes availability and scalab ...
and HBase and the 67th most popular database engine of any type (complete) as of 2018.


History

Accumulo was created in 2008 by the US
National Security Agency The National Security Agency (NSA) is an intelligence agency of the United States Department of Defense, under the authority of the director of national intelligence (DNI). The NSA is responsible for global monitoring, collection, and proces ...
and contributed to the Apache Foundation as an incubator project in September 2011.NSA Submits Open Source, Secure Database To Apache - Government
Informationweek.com (2011-09-06). Retrieved on 2013-09-18.
On March 21, 2012, Accumulo graduated from incubation at Apache, making it a top-level project.


Controversy

In June 2012, the US
Senate Armed Services Committee The Committee on Armed Services, sometimes abbreviated SASC for Senate Armed Services Committee, is a committee of the United States Senate empowered with legislative oversight of the nation's military, including the Department of Defen ...
(SASC) released the Draft 2012 Department of Defense (DoD) Authorization Bill, which included references to Apache Accumulo. In the draft bill SASC required DoD to evaluate whether Apache Accumulo could achieve commercial viability before implementing it throughout DoD. Specific criteria were not included in the draft language, but the establishment of commercial entities supporting Apache Accumulo could be considered a success factor.SASC Accumulo language pro-open source, say proponents
. FierceGovernmentIT (2012-06-14). Retrieved on 2013-09-18.


Main features


Cell-level security

Apache Accumulo extends the Bigtable data model, adding a new element to the key calle
Column Visibility
This element stores a logical combination of security labels that must be satisfied at query time in order for the key and value to be returned as part of a user request. This allows data of varying security requirements to be stored in the same table, and allows users to see only those keys and values for which they are authorized.


Server-side programming

In addition to Cell-Level Security, Apache Accumulo provides a server-side programming mechanism called Iterators that allows users to perform additional processing at the Tablet Server. The range of operations that can be applied is equivalent to those that can be implemented within
MapReduce Combiner function
which produces an aggregate value for several key-value pairs.


User key ordering

Apache Accumulo orders entries in order of user keys, and exposes an iterator over a key range. This allows locality of reference not available from some other distributed stores (including Cassandra and Voldemort that order by hash of the user key).


Papers

* 201
YCSB++: Benchmarking and Performance Debugging Advanced Features in Scalable Table Stores
by Carnegie Mellon University and the National Security Agency. * 201
Driving Big Data With Big Compute
by MIT Lincoln Laboratory. * 201
D4M 2.0 Schema:A General Purpose High Performance Schema for the Accumulo Database
by MIT Lincoln Laboratory. * 201
Spatio-temporal Indexing in Non-relational Distributed Databases
by CCRi


See also

* Bigtable *
Apache Cassandra Apache Cassandra is a free and open-source software, free and open-source database management system designed to handle large volumes of data across multiple Commodity computing, commodity servers. The system prioritizes availability and scalab ...
*
Column-oriented DBMS Data orientation is the representation of tabular data in a linear memory model such as in-disk or in-memory. The two most common representations are column-oriented (columnar format) and row-oriented (row format). The choice of data orienta ...
* Hypertable * HBase *
Hadoop Apache Hadoop () is a collection of Open-source software, open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for Clustered file system, distributed storage and processing of big data usin ...
* sqrrl


References


External links

* {{Authority control Apache Software Foundation projects Bigtable implementations Distributed computing architecture Distributed data stores Free database management systems Hadoop NoSQL products NoSQL