HOME

TheInfoList



OR:

HBase is an
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
non-relational
distributed database A distributed database is a database in which data is stored across different physical locations. It may be stored in multiple computers located in the same physical location (e.g. a data centre); or maybe dispersed over a computer network, netwo ...
modeled after Google's
Bigtable Bigtable is a fully managed wide-column and key-value NoSQL database service for large analytical and operational workloads as part of the Google Cloud portfolio. History Bigtable development began in 2004.. It is now used by a number of Goo ...
and written in
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
. It is developed as part of
Apache Software Foundation The Apache Software Foundation ( ; ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open-source software projects. The ASF was formed from a group of developers of the ...
's
Apache Hadoop Apache Hadoop () is a collection of open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop wa ...
project and runs on top of HDFS (Hadoop Distributed File System) or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data (small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection). HBase features compression, in-memory operation, and
Bloom filter In computing, a Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives ar ...
s on a per-column basis as outlined in the original Bigtable paper. Tables in HBase can serve as the input and output for
MapReduce MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel and distributed algorithm on a cluster. A MapReduce program is composed of a ''map'' procedure, which performs filte ...
jobs run in Hadoop, and may be accessed through the Java API but also through
REST REST (Representational State Transfer) is a software architectural style that was created to describe the design and guide the development of the architecture for the World Wide Web. REST defines a set of constraints for how the architecture of ...
,
Avro Avro (an initialism of the founder's name) was a British aircraft manufacturer. Its designs include the Avro 504, used as a trainer in the First World War, the Avro Lancaster, one of the pre-eminent bombers of the Second World War, and the d ...
or Thrift gateway APIs. HBase is a
wide-column store A wide-column store (or extensible record store) is a type of NoSQL database.Wide Column Stores
and has been widely adopted because of its lineage with Hadoop and HDFS. HBase runs on top of HDFS and is well-suited for fast read and write operations on large datasets with high throughput and low input/output latency. HBase is not a direct replacement for a classic
SQL Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel") is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
, however
Apache Phoenix Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. Phoenix provides a JDBC driver that hides the intricacies of the NoSQL store enabling users to cr ...
project provides a SQL layer for HBase as well as
JDBC Java Database Connectivity (JDBC) is an application programming interface (API) for the Java (programming language), Java programming language which defines how a client may access a database. It is a Java-based data access technology used for Java ...
driver that can be integrated with various
analytics Analytics is the systematic computational analysis of data or statistics. It is used for the discovery, interpretation, and communication of meaningful patterns in data, which also falls under and directly relates to the umbrella term, data sc ...
and
business intelligence Business intelligence (BI) consists of strategies, methodologies, and technologies used by enterprises for data analysis and management of business information. Common functions of BI technologies include Financial reporting, reporting, online an ...
applications. The Apache Trafodion project provides a SQL query engine with
ODBC In computing, Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS). The designers of ODBC aimed to make it independent of database systems and operating systems. An ...
and
JDBC Java Database Connectivity (JDBC) is an application programming interface (API) for the Java (programming language), Java programming language which defines how a client may access a database. It is a Java-based data access technology used for Java ...
drivers and distributed ACID transaction protection across multiple statements, tables and rows that use HBase as a storage engine. HBase is now serving several data-driven websites but
Facebook Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andre ...
's Messaging Platform migrated from HBase to
MyRocks MyRocks is open-source software developed at Facebook in order to use MySQL features with RocksDB implementations. It is based on Oracle MySQL 5.6. Starting from version 10.2.5, MariaDB MariaDB is a community-developed, commercially supporte ...
in 2018.Facebook: Why our 'next-gen' comms ditched MySQL
Retrieved: 17 December 2010
Unlike relational and traditional databases, HBase does not support SQL scripting; instead the equivalent is written in Java, employing similarity with a MapReduce application. In the parlance of Eric Brewer's
CAP Theorem In database theory, the CAP theorem, also named Brewer's theorem after computer scientist Eric Brewer (scientist), Eric Brewer, states that any distributed data store can provide at most Inconsistent triad, two of the following three guarantees: ; ...
, HBase is a CP type system.


History

Apache HBase began as a project by the company
Powerset In mathematics, the power set (or powerset) of a set is the set of all subsets of , including the empty set and itself. In axiomatic set theory (as developed, for example, in the ZFC axioms), the existence of the power set of any set is po ...
out of a need to process massive amounts of data for the purposes of natural-language search. Since 2010 it is a top-level Apache project.
Facebook Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andre ...
elected to implement its new messaging platform using HBase in November 2010, but migrated away from HBase in 2018. The 2.4.x series is the current stable release line, it supersedes earlier release lines.


Use cases & production deployments


Enterprises that use HBase

The following is a list of notable enterprises that have used or are using HBase: *
23andMe 23andMe Holding Co. is an American personal genomics and biotechnology company based in South San Francisco, California. It is best known for providing a direct-to-consumer genetic testing service in which customers provide a saliva testing, sali ...
*
Adobe Adobe (from arabic: الطوب Attub ; ) is a building material made from earth and organic materials. is Spanish for mudbrick. In some English-speaking regions of Spanish heritage, such as the Southwestern United States, the term is use ...
*
Airbnb Airbnb, Inc. ( , an abbreviation of its original name, "Air Bed and Breakfast") is an American company operating an online marketplace for short-and-long-term homestays, experiences and services in various countries and regions. It acts as a ...
uses HBase as part of its AirStream realtime stream computation framework *
Alibaba Group Alibaba Group Holding Limited, branded as Alibaba (), is a Chinese Multinational corporation, multinational technology company specializing in E-commerce in China, e-commerce, retail, Internet, and technology. Founded on 28 June 1999 in Hangzho ...
* Amadeus IT Group, as its main long-term storage DB. *
Bloomberg Bloomberg may refer to: People * Daniel J. Bloomberg (1905–1984), audio engineer * Georgina Bloomberg (born 1983), professional equestrian * Michael Bloomberg (born 1942), American businessman and founder of Bloomberg L.P.; politician a ...
, for time series data storage *
Facebook Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andre ...
used HBase for its messaging platform between 2010 and 2018 *
Flipkart Flipkart Inc. is an Indian e-commerce company, headquartered in Bangalore, and incorporated in Singapore as a private limited company. The company initially focused on online book sales before expanding into other product categories such as con ...
uses HBase for its search index and user insights. * Flurry *
HubSpot HubSpot, Inc. is a US-based developer and marketer of software products for inbound marketing, sales, and customer service. Its products and services are meant to provide tools for customer relationship management, social media marketing, conten ...
*
Imgur Imgur ( , stylized as imgur) is an American online image sharing and image hosting service with a focus on social gossip that was founded by Alan Schaaf in 2009. The service has hosted viral images and memes, particularly those posted on ...
uses HBase to power its notifications system *
Kakao Kakao Corporation () is a South Korean internet conglomerate headquartered in Jeju City. It was formed through the merger of Daum Communications and the original Kakao Inc. in 2010. The company was renamed Daum Kakao in 2014. In 2015, it was ...
*
Netflix Netflix is an American subscription video on-demand over-the-top streaming service. The service primarily distributes original and acquired films and television shows from various genres, and it is available internationally in multiple lang ...
*
Pinterest Pinterest is an American social media service for publishing and discovery of information in the form of digital Bulletin board, pinboards. This includes recipes, home, style, motivation, and inspiration on the Internet using image sharing. Pint ...
* Quicken Loans *
Rocket Fuel Rocket propellant is used as reaction mass ejected from a rocket engine to produce thrust. The energy required can either come from the propellants themselves, as with a chemical rocket, or from an external source, as with ion engines. Overvi ...
* Salesforce.com *
Sears Sears, Roebuck and Co., commonly known as Sears ( ), is an American chain of department stores and online retailer founded in 1892 by Richard Warren Sears and Alvah Curtis Roebuck and reincorporated in 1906 by Richard Sears and Julius Rosen ...
*
Sophos Sophos Limited is a British security software and hardware company. It develops and markets managed security services and cybersecurity software and hardware, such as managed detection and response, incident response and endpoint security s ...
, for some of their back-end systems. *
Spotify Spotify (; ) is a List of companies of Sweden, Swedish Music streaming service, audio streaming and media service provider founded on 23 April 2006 by Daniel Ek and Martin Lorentzon. , it is one of the largest providers of music streaming services ...
uses HBase as base for Hadoop and machine learning jobs. *
Twitter Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...
* Tuenti uses HBase for its messaging platform. *
Xiaomi Xiaomi (; ) is a Chinese multinational corporation and technology company headquartered in Beijing, China. It is best known for consumer electronics software electric vehicles. It is the second-largest manufacturer of smartphones in the worl ...
*
Yahoo! Yahoo (, styled yahoo''!'' in its logo) is an American web portal that provides the search engine Yahoo Search and related services including My Yahoo, Yahoo Mail, Yahoo News, Yahoo Finance, Yahoo Sports, y!entertainment, yahoo!life, and its a ...


See also

*
NoSQL NoSQL (originally meaning "Not only SQL" or "non-relational") refers to a type of database design that stores and retrieves data differently from the traditional table-based structure of relational databases. Unlike relational databases, which ...
* Wide column store *
Bigtable Bigtable is a fully managed wide-column and key-value NoSQL database service for large analytical and operational workloads as part of the Google Cloud portfolio. History Bigtable development began in 2004.. It is now used by a number of Goo ...
*
Apache Cassandra Apache Cassandra is a free and open-source software, free and open-source database management system designed to handle large volumes of data across multiple Commodity computing, commodity servers. The system prioritizes availability and scalab ...
* Oracle NOSQL * Hypertable *
Apache Accumulo Apache Accumulo is a highly scalable sorted, distributed key-value store based on Google's Bigtable. It is a system built on top of Apache Hadoop, Apache ZooKeeper, and Apache Thrift. Written in Java, Accumulo has cell-level access labels and ...
*
MongoDB MongoDB is a source-available, cross-platform, document-oriented database program. Classified as a NoSQL database product, MongoDB uses JSON-like documents with optional database schema, schemas. Released in February 2009 by 10gen (now MongoDB ...
* Project Voldemort *
Riak Riak (pronounced "ree-ack" ) is a distributed NoSQL key-value data store that offers high availability, fault tolerance, operational simplicity, and scalability. Riak moved to an entirely open-source project in August 2017, with many of the ...
*
Sqoop Sqoop is a command-line interface application for transferring data between relational databases and Hadoop. The Apache Sqoop project was retired in June 2021 and moved to the Apache Attic. Description Sqoop supports incremental loads of a single ...
*
Elasticsearch Elasticsearch is a Search engine (computing), search engine based on Apache Lucene, a free and open-source search engine. It provides a distributed, Multitenancy, multitenant-capable full-text search engine with an HTTP web interface and schema ...
*
Apache Phoenix Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. Phoenix provides a JDBC driver that hides the intricacies of the NoSQL store enabling users to cr ...


References


Bibliography

* * *


External links

* /hbase.apache.org/ Official Apache HBase homepage {{Authority control Distributed data stores HBase Bigtable implementations Hadoop Free database management systems NoSQL Structured storage