HOME

TheInfoList



OR:

Apache ZooKeeper is an open-source server for highly reliable distributed coordination of cloud applications. It is a project of the
Apache Software Foundation The Apache Software Foundation (ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open source software projects. The ASF was formed from a group of developers of the ...
. ZooKeeper is essentially a service for
distributed systems A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another from any system. Distributed computing is a field of computer sci ...
offering a
hierarchical A hierarchy (from Greek: , from , 'president of sacred rites') is an arrangement of items (objects, names, values, categories, etc.) that are represented as being "above", "below", or "at the same level as" one another. Hierarchy is an important ...
key-value store, which is used to provide a distributed configuration service, synchronization service, and naming registry for large distributed systems (see ''
Use cases In software and systems engineering, the phrase use case is a polyseme with two senses: # A usage scenario for a piece of software; often used in the plural to suggest situations where a piece of software may be useful. # A potential scenario ...
''). ZooKeeper was a sub-project of
Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage an ...
but is now a top-level Apache project in its own right.


Overview

ZooKeeper's
architecture Architecture is the art and technique of designing and building, as distinguished from the skills associated with construction. It is both the process and the product of sketching, conceiving, planning, designing, and constructing buildings ...
supports
high availability High availability (HA) is a characteristic of a system which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period. Modernization has resulted in an increased reliance on these systems. F ...
through redundant services. The clients can thus ask another ZooKeeper leader if the first fails to answer. ZooKeeper nodes store their data in a hierarchical name space, much like a file system or a
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are ...
data structure. Clients can read from and write to the nodes and in this way have a shared configuration service. ZooKeeper can be viewed as an atomic broadcast system, through which updates are
totally ordered In mathematics, a total or linear order is a partial order in which any two elements are comparable. That is, a total order is a binary relation \leq on some set X, which satisfies the following for all a, b and c in X: # a \leq a ( reflexive ...
. The ZooKeeper Atomic Broadcast (ZAB) protocol is the core of the system. ZooKeeper is used by companies including
Yelp Yelp Inc. is an American company that develops the Yelp.com website and the Yelp mobile app, which publish crowd-sourced reviews about businesses. It also operates Yelp Guest Manager, a table reservation service. It is headquartered in San F ...
,
Rackspace Rackspace Technology, Inc. is an American cloud computing company based in Windcrest, Texas, an inner suburb of San Antonio, Texas. The company also has offices in Blacksburg, Virginia, and Austin, Texas, as well as in Australia, Canada, United ...
,
Yahoo! Yahoo! (, styled yahoo''!'' in its logo) is an American web services provider. It is headquartered in Sunnyvale, California and operated by the namesake company Yahoo Inc., which is 90% owned by investment funds managed by Apollo Global Mana ...
,
Odnoklassniki Odnoklassniki ( rus, Одноклассники, t=Classmates) is a social network service used mainly in Russia and former Soviet Republics. The site was developed by Albert Popkov and launched on March 4, 2006. The website currently has more ...
,
Reddit Reddit (; stylized in all lowercase as reddit) is an American social news aggregation, content rating, and discussion website. Registered users (commonly referred to as "Redditors") submit content to the site such as links, text posts, imag ...
,
NetApp NetApp, Inc. is an American hybrid cloud data services and data management company headquartered in San Jose, California. It has ranked in the Fortune 500 from 2012–2021. Founded in 1992 with an IPO in 1995, NetApp offers cloud data service ...
SolidFire,
Meta Meta (from the Greek μετά, '' meta'', meaning "after" or "beyond") is a prefix meaning "more comprehensive" or "transcending". In modern nomenclature, ''meta''- can also serve as a prefix meaning self-referential, as a field of study or end ...
,
Twitter Twitter is an online social media and social networking service owned and operated by American company Twitter, Inc., on which users post and interact with 280-character-long messages known as "tweets". Registered users can post, like, and ...
and
eBay eBay Inc. ( ) is an American multinational e-commerce company based in San Jose, California, that facilitates consumer-to-consumer and business-to-consumer sales through its website. eBay was founded by Pierre Omidyar in 1995 and became ...
as well as
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized so ...
enterprise search Enterprise search is the practice of making content from multiple enterprise-type sources, such as databases and intranets, searchable to a defined audience. "Enterprise search" is used to describe the software of search information within an ente ...
systems like
Solr Solr (pronounced "solar") is an open-source enterprise-search platform, written in Java. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features a ...
. ZooKeeper is modeled after Google's Chubby lock service and was originally developed at Yahoo! for streamlining the processes running on big-data clusters by storing the status in local log files on the ZooKeeper servers. These servers communicate with the client machines to provide them the information. ZooKeeper was developed in order to fix the bugs that occurred while deploying distributed big-data applications. Some of the prime features of Apache ZooKeeper are: * Reliable System: This system is very reliable as it keeps working even if a node fails. * Simple Architecture: The architecture of ZooKeeper is quite simple as there is a shared hierarchical namespace which helps coordinating the processes. * Fast Processing: ZooKeeper is especially fast in "read-dominant" workloads (i.e. workloads in which reads are much more common than writes). * Scalable: The performance of ZooKeeper can be improved by adding nodes.


Architecture

Some common terminologies regarding the ZooKeeper architecture: * Node: The systems installed on the cluster * ZNode: The nodes where the status is updated by other nodes in cluster * Client applications: The tools that interact with the distributed applications * Server applications: Allows the client applications to interact using a common interface The services in the cluster are replicated and stored on a set of servers (called an "ensemble"), each of which maintains an in-memory database containing the entire data tree of state as well as a transaction log and snapshots stored persistently. Multiple client applications can connect to a server, and each client maintains a TCP connection through which it sends requests and heartbeats and receives responses and watch events for monitoring.


Use cases

Typical use cases for ZooKeeper are: * Naming service *
Configuration management Configuration management (CM) is a process for establishing and maintaining consistency of a product's performance, functional, and physical attributes with its requirements, design, and operational information throughout its life. The CM proc ...
*
Data Synchronization Data synchronization is the process of establishing consistency between source and target data stores, and the continuous harmonization of the data over time. It is fundamental to a wide variety of applications, including file synchronization and m ...
*
Leader election In distributed computing, leader election is the process of designating a single process as the organizer of some task distributed among several computers (nodes). Before the task has begun, all network nodes are either unaware which node will se ...
*
Message queue In computer science, message queues and mailboxes are software-engineering components typically used for inter-process communication (IPC), or for inter- thread communication within the same process. They use a queue for messaging – the ...
*
Notification system In information technology, a notification system is a combination of software and hardware that provides a means of delivering a message to a set of recipients. It commonly shows activity related to an account. Such systems constitute an important ...


Client libraries

In addition to the client libraries included with the ZooKeeper distribution, a number of third-party libraries such as Apache Curator and Kazoo are available that make using ZooKeeper easier, add additional functionality, additional programming languages, etc.


Apache projects using ZooKeeper

*
Apache Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage a ...
*
Apache Accumulo Apache Accumulo is a highly scalable sorted, distributed key-value store based on Google's Bigtable. It is a system built on top of Apache Hadoop, Apache ZooKeeper, and Apache Thrift. Written in Java, Accumulo has cell-level access labels and ...
*
Apache HBase HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File Sys ...
*
Apache Hive Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Tradi ...
*
Apache Kafka Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency plat ...
*
Apache Drill Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dre ...
*
Apache Solr Solr (pronounced "solar") is an open-source enterprise-search platform, written in Java. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features a ...
*
Apache Spark Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Califor ...
* Apache NiFi *
Apache Druid Druid is a column-oriented, open-source, distributed data store written in Java. Druid is designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data.Hemsoth, Nicole. , ''Datanami'', 8 November ...
* Apache Helix *
Apache Pinot Apache Pinot is a Column-oriented DBMS, column-oriented, open-source software, open-source, Distributed database, distributed data store written in Java (programming language), Java. Pinot is designed to execute Online analytical processing, OLAP ...


See also

*
Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage an ...


References


External links

* {{DEFAULTSORT:Zookeeper
ZooKeeper A zookeeper, sometimes referred as animal keeper, is a person who manages zoo animals that are kept in captivity for conservation or to be displayed to the public.Hurwitz, Jane. Choosing a Career in Animal Care (World of Work). New York: Rosen Gr ...
Configuration management Free software programmed in Java (programming language) Hadoop