HOME

TheInfoList



OR:

ClickHouse is an open-source column-oriented DBMS (columnar database management system) for
online analytical processing In computing, online analytical processing (OLAP) (), is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term ''OLAP'' was created as a slight modification of the traditional database term online transaction proces ...
(OLAP) that allows users to generate analytical reports using SQL queries in real-time. ClickHouse Inc. is headquartered in the
San Francisco Bay Area The San Francisco Bay Area, commonly known as the Bay Area, is a List of regions of California, region of California surrounding and including San Francisco Bay, and anchored by the cities of Oakland, San Francisco, and San Jose, California, S ...
with the subsidiary, ClickHouse B.V., based in
Amsterdam Amsterdam ( , ; ; ) is the capital of the Netherlands, capital and Municipalities of the Netherlands, largest city of the Kingdom of the Netherlands. It has a population of 933,680 in June 2024 within the city proper, 1,457,018 in the City Re ...
,
Netherlands , Terminology of the Low Countries, informally Holland, is a country in Northwestern Europe, with Caribbean Netherlands, overseas territories in the Caribbean. It is the largest of the four constituent countries of the Kingdom of the Nether ...
. In September 2021 in San Francisco, CA, ClickHouse incorporated to house the open source technology with an initial $50 million investment from
Index Ventures Index Ventures is a European venture capital firm with headquarters in both San Francisco and London. It invests primarily in tech companies. History Index Ventures has its origins in a Switzerland, Swiss bond (finance), bond-trading firm cal ...
and Benchmark Capital with participation by Yandex N.V. and others. On October 28, 2021 the company received Series B funding totaling $250 million at a valuation of $2 billion from Coatue Management, Altimeter Capital, and other investors. The company continues to build the open source project and engineering cloud technology.


History

ClickHouse’s technology was first developed over 10 years ago at Yandex, Russia's largest technology company. In 2009, Alexey Milovidov and developers started an experimental project to check the hypothesis if it was viable to generate analytical reports in real-time from non-aggregated data that is also constantly added in real-time. The developers spent 3 years to prove this hypothesis, and in 2012 ClickHouse launched in production for the first time to power Yandex.Metrica. Unlike custom data structures used before, ClickHouse was applicable more generally to work as a database management system. The power and utility of ClickHouse offered a true column-oriented DBMS, it allowed for systems to generate reports from petabytes of raw data with sub-second latencies. ClickHouse was widely adopted at Yandex including for Yandex.Tank load testing tool and Yandex.Market to monitor site accessibility and KPIs. In 2016, the ClickHouse project was released as
open-source software Open-source software (OSS) is Software, computer software that is released under a Open-source license, license in which the copyright holder grants users the rights to use, study, change, and Software distribution, distribute the software an ...
under the Apache 2 license in June 2016 to power analytical use cases around the globe. The systems at the time offered a server throughput of a hundred thousand rows per second, ClickHouse outperformed them with a throughput of hundreds of millions of rows per second. Since ClickHouse became available as open source in 2016, its popularity has grown exponentially, as evidenced through adoption by industry-leading companies like Uber, Comcast, eBay, and Cisco. ClickHouse was also implemented at CERN's LHCb experiment to store and process
metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
on 10 billion events with over 1000 attributes per event.


Features

The main features of the ClickHouse DBMS are: * ''True column-oriented
DBMS In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and ana ...
.'' Nothing is stored with the values. For example, constant-length values are supported to avoid storing their length "number" next to the values. * ''Linear scalability.'' It's possible to extend a cluster by adding servers. * ''Fault tolerance.'' The system is a cluster of shards, where each shard is a group of replicas. ClickHouse uses asynchronous multi-master replication. Data is written to any available replica, then distributed to all the remaining replicas. ZooKeeper is used for coordinating processes, but it's not involved in query processing and execution. * ''Capability to store and process petabytes of data.'' * '' SQL support.'' ClickHouse supports an extended SQL-like language that includes arrays and nested data structures, approximate and URI functions, and the availability to connect an external key-value store. * ''High performance.'' ** Vector calculations are used. Data is not only stored by columns, but is processed by vectors (parts of columns). This approach allows it to achieve high CPU performance. ** Sampling and approximate calculations are supported. ** Parallel and distributed query processing is available (including JOINs). * ''Data compression.'' * ''
Hard disk drive A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating hard disk drive platter, pla ...
(HDD) optimization.'' The system can process data that doesn't fit in
random-access memory Random-access memory (RAM; ) is a form of Computer memory, electronic computer memory that can be read and changed in any order, typically used to store working Data (computing), data and machine code. A random-access memory device allows ...
(RAM). * ''Clients for
database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
(DB) connectivity.'' Database connection options include the console client, the
HTTP HTTP (Hypertext Transfer Protocol) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, wher ...
API An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
, or one of the wrappers (wrappers are available for Python, PHP, NodeJS,
Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". Perl was developed ...
,
Ruby Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
and R). ODBC driver and JDBC driver are also available for ClickHouse.


Limitations

ClickHouse has some features that can be considered disadvantages: * There is no support for transactions. * Lack of full-fledged UPDATE/DELETE implementation.


Use cases

ClickHouse was designed for
OLAP In computing, online analytical processing (OLAP) (), is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term ''OLAP'' was created as a slight modification of the traditional database term online transaction processi ...
queries. ClickHouse performs well when: * It works with a small number of tables that contain a large number of columns. * Queries use a large number of rows extracted from the DB, but only a small subset of columns. * Queries are relatively rare (usually around 100 requests per second per server). * Column values are fairly small, usually consisting of numbers and short strings (for example, 60
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...
s per URL). * High throughput is required when processing a single query (up to billions of rows per second per server). * A query result is mostly filtered or aggregated. * Data update uses a simple scenario (usually batch-only, without complicated transactions). For simple queries, latencies of 50 ms are typical. One of the common cases for ClickHouse is server log analysis. After setting regular data uploads to ClickHouse (it's recommended to insert data in fairly large batches with more than 1000 rows), it's possible to analyze incidents with instant queries or monitor a service's metrics, such as error rates, response times, and so on. ClickHouse can also be used as an internal data warehouse for in-house analysts. ClickHouse can store data from different systems (such as
Hadoop Apache Hadoop () is a collection of Open-source software, open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for Clustered file system, distributed storage and processing of big data usin ...
or certain logs) and analysts can build internal dashboards with the data or perform real-time analysis for business purposes.


Benchmark results

According to benchmark tests conducted by its developers, for
OLAP In computing, online analytical processing (OLAP) (), is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term ''OLAP'' was created as a slight modification of the traditional database term online transaction processi ...
queries ClickHouse is more than 100 times faster than Hive (a
DBMS In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and ana ...
based on the
Hadoop Apache Hadoop () is a collection of Open-source software, open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for Clustered file system, distributed storage and processing of big data usin ...
technology stack) or
MySQL MySQL () is an Open-source software, open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A rel ...
(a common RDBMS).


See also

* List of column-oriented DBMSes


References

{{reflist, 2


External links


ClickHouse official website
Free database management systems Online analytical processing Structured storage Data warehousing products Data analysis software Distributed data stores Free software programmed in C++ Software using the Apache license