HOME

TheInfoList



OR:

HBase is an
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
non-relational
distributed database A distributed database is a database in which data is stored across different physical locations. It may be stored in multiple computers located in the same physical location (e.g. a data centre); or maybe dispersed over a network of interconne ...
modeled after Google's Bigtable and written in
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File System) or
Alluxio Alluxio is an open-source virtual distributed file system (VDFS). Initially as research project "Tachyon", Alluxio was created at the University of California, Berkeley's AMPLab as Haoyuan Li's Ph.D. Thesis, advised by Professor Scott Shenker ...
, providing Bigtable-like capabilities for Hadoop. That is, it provides a
fault-tolerant Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the ...
way of storing large quantities of
sparse Sparse is a computer software tool designed to find possible coding faults in the Linux kernel. Unlike other such tools, this static analysis tool was initially designed to only flag constructs that were likely to be of interest to kernel de ...
data (small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection). HBase features compression, in-memory operation, and
Bloom filter A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in ...
s on a per-column basis as outlined in the original Bigtable paper. Tables in HBase can serve as the input and output for MapReduce jobs run in Hadoop, and may be accessed through the Java API but also through
REST Rest or REST may refer to: Relief from activity * Sleep ** Bed rest * Kneeling * Lying (position) * Sitting * Squatting position Structural support * Structural support ** Rest (cue sports) ** Armrest ** Headrest ** Footrest Arts and ente ...
, Avro or
Thrift Thrift may refer to: * Frugality * A savings and loan association in the United States * Apache Thrift, a remote procedure call (RPC) framework * Thrift (plant), a plant in the genus ''Armeria'' * Syd Thrift (1929–2006), American baseball exec ...
gateway APIs. HBase is a wide-column store and has been widely adopted because of its lineage with Hadoop and HDFS. HBase runs on top of HDFS and is well-suited for fast read and write operations on large datasets with high throughput and low input/output latency. HBase is not a direct replacement for a classic SQL
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
, however Apache Phoenix project provides a SQL layer for HBase as well as JDBC driver that can be integrated with various
analytics Analytics is the systematic computational analysis of data or statistics. It is used for the discovery, interpretation, and communication of meaningful patterns in data. It also entails applying data patterns toward effective decision-making. It ...
and
business intelligence Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of business intelligence technologies include reporting, online analytical pr ...
applications. The Apache Trafodion project provides a SQL query engine with
ODBC In computing, Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS). The designers of ODBC aimed to make it independent of database systems and operating systems. A ...
and JDBC drivers and distributed ACID transaction protection across multiple statements, tables and rows that use HBase as a storage engine. HBase is now serving several data-driven websites but
Facebook Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin ...
's Messaging Platform migrated from HBase to MyRocks in 2018.Facebook: Why our 'next-gen' comms ditched MySQL
Retrieved: 17 December 2010
Unlike relational and traditional databases, HBase does not support SQL scripting; instead the equivalent is written in Java, employing similarity with a MapReduce application. In the parlance of Eric Brewer's CAP Theorem, HBase is a CP type system.


History

Apache HBase began as a project by the company
Powerset In mathematics, the power set (or powerset) of a set is the set of all subsets of , including the empty set and itself. In axiomatic set theory (as developed, for example, in the ZFC axioms), the existence of the power set of any set is p ...
out of a need to process massive amounts of data for the purposes of natural-language search. Since 2010 it is a top-level Apache project.
Facebook Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin ...
elected to implement its new messaging platform using HBase in November 2010, but migrated away from HBase in 2018. The 2.2.z series is the current stable release line, it supersedes earlier release lines.


Use cases & production deployments


Enterprises that use HBase

The following is a list of notable enterprises that have used or are using HBase: *
23andMe 23andMe Holding Co. is a publicly held personal genomics and biotechnology company based in South San Francisco, California. It is best known for providing a direct-to-consumer genetic testing service in which customers provide a saliva sample ...
*
Adobe Adobe ( ; ) is a building material made from earth and organic materials. is Spanish for '' mudbrick''. In some English-speaking regions of Spanish heritage, such as the Southwestern United States, the term is used to refer to any kind of ...
*
Airbnb Airbnb, Inc. ( ), based in San Francisco, California, operates an online marketplace focused on short-term homestays and experiences. The company acts as a broker and charges a commission from each booking. The company was founded in 2008 by ...
uses HBase as part of its AirStream realtime stream computation framework * Alibaba Group * Amadeus IT Group, as its main long-term storage DB. *
Bloomberg Bloomberg may refer to: People * Daniel J. Bloomberg (1905–1984), audio engineer * Georgina Bloomberg (born 1983), professional equestrian * Michael Bloomberg (born 1942), American businessman and founder of Bloomberg L.P.; politician and ...
, for time series data storage *
Facebook Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin ...
used HBase for its messaging platform between 2010 and 2018 *
Flipkart Flipkart Private Limited is an Indian e-commerce company, headquartered in Bengaluru, and incorporated in Singapore as a private limited company. The company initially focused on online book sales before expanding into other product categories ...
uses HBase for its search index and user insights. * Flurry * HubSpot *
Imgur Imgur ( , stylized as imgur) is an American online image sharing and image hosting service with a focus on social gossip that was founded by Alan Schaaf in 2009. The service has hosted viral images and meme, particularly those posted on Reddi ...
uses HBase to power its notifications system *
Kakao Kakao ( ko, 카카오) is a South Korean internet company that was established in 2010. It formed as a result of a merger between Daum Communications and the original Kakao Inc. In 2014, the company was renamed Daum Kakao. The company rebranded ...
*
Netflix Netflix, Inc. is an American subscription video on-demand over-the-top streaming service and production company based in Los Gatos, California. Founded in 1997 by Reed Hastings and Marc Randolph in Scotts Valley, California, it offers a ...
*
Pinterest Pinterest is an American image sharing and social media service designed to enable saving and discovery of information (specifically "ideas") on the internet using images, and on a smaller scale, animated GIFs and videos, in the form of pinboa ...
*
Quicken Loans Rocket Mortgage, LLC (formerly known as Quicken Loans LLC) is a mortgage loan provider. It is headquartered in the One Campus Martius building in the financial district of Downtown Detroit, Michigan. In January 2018, the company became the lar ...
* Richrelevance * Rocket Fuel * Salesforce.com *
Sears Sears, Roebuck and Co. ( ), commonly known as Sears, is an American chain of department stores founded in 1892 by Richard Warren Sears and Alvah Curtis Roebuck and reincorporated in 1906 by Richard Sears and Julius Rosenwald, with what began ...
*
Sophos Sophos Group plc is a British based security software and hardware company. Sophos develops products for communication endpoint, encryption, network security, email security, mobile security and unified threat management. Sophos is primari ...
, for some of their back-end systems. *
Spotify Spotify (; ) is a proprietary Swedish audio streaming and media services provider founded on 23 April 2006 by Daniel Ek and Martin Lorentzon. It is one of the largest music streaming service providers, with over 456 million monthly active us ...
uses HBase as base for Hadoop and machine learning jobs. * Tuenti uses HBase for its messaging platform. *
Xiaomi Corporation (; ), commonly known as Xiaomi and registered as Xiaomi Inc., is a Chinese designer and manufacturer of consumer electronics and related software, home appliances, and household items. Behind Samsung, it is the second largest m ...
*
Yahoo! Yahoo! (, styled yahoo''!'' in its logo) is an American web services provider. It is headquartered in Sunnyvale, California and operated by the namesake company Yahoo! Inc. (2017–present), Yahoo Inc., which is 90% owned by investment funds ma ...


See also

*
NoSQL A NoSQL (originally referring to "non- SQL" or "non-relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed ...
*
Wide column store A wide-column store (or extensible record store) is a type of NoSQL database.Wide Column Stores
* Bigtable *
Apache Cassandra Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cass ...
* Oracle NOSQL * Hypertable *
Apache Accumulo Apache Accumulo is a highly scalable sorted, distributed key-value store based on Google's Bigtable. It is a system built on top of Apache Hadoop, Apache ZooKeeper, and Apache Thrift. Written in Java, Accumulo has cell-level access labels a ...
*
MongoDB MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and licensed under the Ser ...
* Project Voldemort * Riak *
Sqoop Sqoop is a command-line interface application for transferring data between relational databases and Hadoop. The Apache Sqoop project was retired in June 2021 and moved to the Apache Attic. Description Sqoop supports incremental loads of a single ...
*
Elasticsearch Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is dual ...
* Apache Phoenix


References


Bibliography

* * *


External links

* /hbase.apache.org/ Official Apache HBase homepage {{DEFAULTSORT:Hbase
HBase HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed Fil ...
Bigtable implementations Hadoop Free database management systems NoSQL Structured storage