HOME

TheInfoList



OR:

Apache Pinot is a column-oriented,
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
,
distributed Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations *Probability distribution, the probability of a particular value or value range of a varia ...
data store In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
written in
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
. Pinot is designed to execute
OLAP Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, repo ...
queries with low latency. It is suited in contexts where fast analytics, such as aggregations, are needed on immutable data, possibly, with real-time data ingestion.Pawar, Neha
"Pinot Joins Apache Incubator"
, ''LinkedIn Engineering'', 01 April 2019
The name Pinot comes from the
Pinot grape Pinot (pronounced ) is a Burgundian grape family. Wine grape varieties in the Pinot family * Pinot blanc (Pinot bianco, Weißburgunder) * Pinot gris (Pinot grigio, Grauburgunder) * Pinot Meunier (Schwarzriesling) * Pinot noir (Spätburgunder, ...
vines that are pressed into liquid that is used to produce a variety of different wines. The founders of the database chose the name as a metaphor for analyzing vast quantities of data from a variety of different file formats or streaming data sources. Pinot was first created at
LinkedIn LinkedIn () is an American business and employment-oriented online service that operates via websites and mobile apps. Launched on May 5, 2003, the platform is primarily used for professional networking and career development, and allows job s ...
after the engineering staff determined that there were no off the shelf solutions that met the social networking site's requirements like predictable low latency, data freshness in seconds, fault tolerance and scalability. Pinot is used in production by technology companies such as
Uber Uber Technologies, Inc. (Uber), based in San Francisco, provides mobility as a service, ride-hailing (allowing users to book a car and driver to transport them in a way similar to a taxi), food delivery ( Uber Eats and Postmates), pack ...
,
Microsoft Microsoft Corporation is an American multinational corporation, multinational technology company, technology corporation producing Software, computer software, consumer electronics, personal computers, and related services headquartered at th ...
, and
Factual A fact is a datum about one or more aspects of a circumstance, which, if accepted as true and proven true, allows a logical conclusion to be reached on a true–false evaluation. Standard reference works are often used to check facts. Scient ...
.


History

Pinot was started as an internal project at LinkedIn in 2013 to power a variety of user-facing and business-facing products. The first analytics product at LinkedIn to use Pinot was a redesign of the social networking site's feature that allows members to see who has viewed their profile in real-time. The project was open-sourced in June 2015 under an Apache 2.0 license and was donated to the Apache Software Foundation by LinkedIn in June 2019.


Architecture

Pinot uses Apache Helix for cluster management. Helix is embedded as an agent within the different components and uses
Apache ZooKeeper Apache ZooKeeper is an open-source server for highly reliable distributed coordination of cloud applications. It is a project of the Apache Software Foundation. ZooKeeper is essentially a service for distributed systems offering a hierarchical ...
for coordination and maintaining the overall cluster state and health. All Pinot servers and brokers are managed by Helix. Helix is a generic cluster management framework to manage partitions and replicas in a distributed system.


Query management

Queries are received by brokers—which checks the request against the segment-to-server routing table—scattering the request between real-time and offline servers.


Cluster management

Pinot leverages Apache Helix for cluster management. Helix is a cluster management framework to manage replicated, partitioned resources in a distributed system. Helix uses Zookeeper to store cluster state and metadata.


Features

Pinot shares similar features with comparable OLAP datastores, such as
Apache Druid Druid is a column-oriented, open-source, distributed data store written in Java. Druid is designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data.Hemsoth, Nicole. , ''Datanami'', 8 November ...
. Like Druid, Pinot is a column-oriented database with various compression schemes such as Run Length and Fixed-Bit Length. Pinot supports pluggable indexing technologies - Sorted Index, Bitmap Index,
Inverted Index In computer science, an inverted index (also referred to as a postings list, postings file, or inverted file) is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of do ...
, Star-Tree Index, and Range Index, which are what primarily differentiates Pinot from other OLAP datastores. Pinot supports near real-time ingestion from streams such as
Kafka Franz Kafka (3 July 1883 – 3 June 1924) was a German-speaking Bohemian novelist and short-story writer, widely regarded as one of the major figures of 20th-century literature. His work fuses elements of realism and the fantastic. It typ ...
,
AWS Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon that provides on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered pay-as-you-go basis. These cloud computing web services provide di ...
Kinesis and batch ingestion from sources such as
Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage ...
, S3,
Azure Azure may refer to: Colour * Azure (color), a hue of blue ** Azure (heraldry) ** Shades of azure, shades and variations Arts and media * ''Azure'' (Art Farmer and Fritz Pauer album), 1987 * Azure (Gary Peacock and Marilyn Crispell album), 2013 * ...
, GCS. Like most other
OLAP Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, repo ...
datastores and
data warehousing In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. DWs are central repositories of integra ...
solutions, Pinot supports a SQL-like query language that supports selection, aggregation, filtering, group by, order by, distinct queries on data.


See also

*
List of column-oriented DBMSes This article is a list of column-oriented database management system software. Free and open-source software (FOSS) Platform as a Service (PaaS) *Amazon Redshift * Microsoft Azure SQL Data Warehouse * Google BigQuery * Oracle Autonomous ...
* Comparison of OLAP servers


References


External links

* {{Apache Software Foundation
Pinot Pinot may refer to: *Pinot (grape), a grape family * Pinot (surname) *Pinot (restaurant) Joachim Splichal is a celebrity chef based in Los Angeles, California. In 1991, he was declared "Best California Chef" by the James Beard Foundation. Four ye ...
Distributed data stores Structured storage Free database management systems Database engines Big data products