Zarr (data Format)
   HOME

TheInfoList



OR:

Zarr is an
open standard An open standard is a standard that is openly accessible and usable by anyone. It is also a common prerequisite that open standards use an open license that provides for extensibility. Typically, anybody can participate in their development due to ...
for storing large multidimensional
array An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
data. It specifies a protocol and data format, and is designed to be "
cloud In meteorology, a cloud is an aerosol consisting of a visible mass of miniature liquid droplets, frozen crystals, or other particles, suspended in the atmosphere of a planetary body or similar space. Water or various other chemicals may ...
ready" including
random access Random access (also called direct access) is the ability to access an arbitrary element of a sequence in equal time or any datum from a population of addressable elements roughly as easily and efficiently as any other, no matter how many elemen ...
, by dividing data into subsets referred to as chunks. Zarr can be used within many programming languages, including
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (prog ...
,
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
,
JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...
,
C++ C++ (, pronounced "C plus plus" and sometimes abbreviated as CPP or CXX) is a high-level, general-purpose programming language created by Danish computer scientist Bjarne Stroustrup. First released in 1985 as an extension of the C programmin ...
,
Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH) ...
and
Julia Julia may refer to: People *Julia (given name), including a list of people with the name *Julia (surname), including a list of people with the name *Julia gens, a patrician family of Ancient Rome *Julia (clairvoyant) (fl. 1689), lady's maid of Qu ...
. It has been used by organizations such as
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
and
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
to publish large datasets. Early versions of Zarr were first released in 2015 by Alistair Miles. Zarr is designed to support high-throughput distributed I/O on different storage systems, which is a common requirement in
cloud computing Cloud computing is "a paradigm for enabling network access to a scalable and elastic pool of shareable physical or virtual resources with self-service provisioning and administration on-demand," according to International Organization for ...
. Multiple read operations can efficiently occur to a Zarr array in parallel, or multiple write operations in parallel.


Format description

The main data format in Zarr is multidimensional arrays. For parallelisable access, these arrays are stored and accessed as a grid of so-called "chunks". The actual data format on disk depends on the compressor and storage plugins selected by the user. Zarr's design was influenced by that of
HDF5 Hierarchical Data Format (HDF) is a set of file formats (HDF4, HDF5) designed to store and organize large amounts of data. Originally developed at the U.S. National Center for Supercomputing Applications, it is supported by The HDF Group, a non- ...
, and so it includes similar features for
metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
and grouping: arrays can be grouped into named hierarchies, and they can also be annotated with key-value metadata stored alongside the array.


Applications

For bioimaging such as
microscopy Microscopy is the technical field of using microscopes to view subjects too small to be seen with the naked eye (objects that are not within the resolution range of the normal eye). There are three well-known branches of microscopy: optical mic ...
, a consortium called the Open Microscopy Environment (OME) created a format called "OME-Zarr", based on Zarr with some discipline-specific extensions. Similarly, Zarr is being used to publish weather and satellite data and energy data, among others.


See also

*
NetCDF NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The project homepage is hosted by the Unidat ...
*
Dask The DASK was the first computer in Denmark. It was commissioned in 1955, designed and constructed by Regnecentralen, and began operation in September 1957. DASK is an acronym for Dansk Aritmetisk Sekvens Kalkulator or ''Danish Arithmetic Sequence ...


References


External links


Official website
Data serialization formats Open formats {{compu-stub