HOME

TheInfoList



OR:

In computer programming contexts, a data cube (or datacube) is a multi-dimensional ("n-D") array of values. Typically, the term data cube is applied in contexts where these arrays are massively larger than the hosting computer's main memory; examples include multi-terabyte/petabyte
data warehouse In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for Business intelligence, reporting and data analysis and is a core component of business intelligence. Data warehouses are central Re ...
s and
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
of image data. The data cube is used to represent data (sometimes called facts) along some dimensions of interest. For example, in
online analytical processing In computing, online analytical processing (OLAP) (), is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term ''OLAP'' was created as a slight modification of the traditional database term online transaction proces ...
(OLAP) such dimensions could be the subsidiaries a company has, the products the company offers, and time; in this setup, a fact would be a sales event where a particular product has been sold in a particular subsidiary at a particular time. In satellite image timeseries dimensions would be latitude and longitude coordinates and time; a fact (sometimes called measure) would be a pixel at a given space and time as taken by the satellite (following some processing that is not of concern here). Even though it is called a ''cube'' (and the examples provided above happen to be 3-dimensional for brevity), a data cube generally is a multi-dimensional concept which can be 1-dimensional, 2-dimensional, 3-dimensional, or higher-dimensional. In any case, every dimension divides data into groups of cells whereas each cell in the cube represents a single measure of interest. Sometimes cubes hold only a few values with the rest being ''empty'', i.e. undefined, while sometimes most or all cube coordinates hold a cell value. In the first case such data are called ''sparse'', and in the second case they are called ''dense'', although there is no hard delineation between the two.


History

Multi-dimensional arrays have long been familiar in programming languages. Fortran offers arbitrarily-indexed 1-D arrays and arrays of arrays, which allows the construction of higher-dimensional arrays, up to 15 dimensions. APL supports n-D arrays with a rich set of operations. All these have in common that arrays must fit into the main memory and are available only while the particular program maintaining them (such as image processing software) is running. A series of data exchange formats support storage and transmission of data cube-like data, often tailored towards particular application domains. Examples include MDX for statistical (in particular, business) data,
Hierarchical Data Format Hierarchical Data Format (HDF) is a set of file formats (HDF4, HDF5) designed to store and organize large amounts of data. Originally developed at the U.S. National Center for Supercomputing Applications, it is supported by The HDF Group, a non- ...
for general scientific data, and
TIFF Tag Image File Format or Tagged Image File Format, commonly known by the abbreviations TIFF or TIF, is an image file format for storing raster graphics images, popular among graphic artists, the publishing industry, and photographers. TIFF is w ...
for imagery. In 1992,
Peter Baumann Peter Baumann (born 29 January 1953) is a German musician. He formed the core line-up of the pioneering German electronic group Tangerine Dream with Edgar Froese and Christopher Franke in 1971. Baumann composed his first solo album in 1976, wh ...
introduced management of massive data cubes with high-level user functionality combined with an efficient software architecture. Datacube operations include subset extraction, processing, fusion, and in general queries in the spirit of
data manipulation language A data manipulation language (DML) is a computer programming language used for adding (inserting), deleting, and modifying (updating) data in a database. A DML is often a sublanguage of a broader database language such as SQL, with the DML com ...
s like
SQL Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel") is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
. Some years after, the data cube concept was applied to describe time-varying business data as data cubes by Jim Gray, et al., and by
Venky Harinarayan Venkatesh "Venky" Harinarayan is an Indian entrepreneur. He is the co-founder of Cambrian Ventures and Kosmix. Harinarayan also co-founded Junglee Corp. and played a significant role at Amazon.com in the late 1990s. Originally from Bombay, Ind ...
,
Anand Rajaraman Anand Rajaraman is a Web and technology entrepreneur. He is the co-founder of Cambrian Ventures and Kosmix. Rajaraman also co-founded former Junglee Corp. and played a significant role at Amazon.com in the late 1990s. Personal life and educati ...
and
Jeff Ullman Jeffrey David Ullman (born November 22, 1942) is an American computer scientist and the Stanford W. Ascherman Professor of Engineering, Emeritus, at Stanford University. His textbooks on compilers (various editions are popularly known as the dr ...
which rank among the top 500 most cited computer science articles over a 25-year period. Around that time, a working group on Multi-Dimensional Databases ("Arbeitskreis Multi-Dimensionale Datenbanken") was established at German
Gesellschaft für Informatik The German Informatics Society (GI) () is a German professional society for computer science, with around 20,000 personal and 250 corporate members. It is the biggest organized representation of its kind in the German-speaking world. History The ...
. Datacube Inc. was an
image processing An image or picture is a visual representation. An image can be two-dimensional, such as a drawing, painting, or photograph, or three-dimensional, such as a carving or sculpture. Images may be displayed through other media, including a pr ...
company selling hardware and
software Software consists of computer programs that instruct the Execution (computing), execution of a computer. Software also includes design documents and specifications. The history of software is closely tied to the development of digital comput ...
applications for the PC market in 1996, however without addressing data cubes as such. The EarthServer initiative has established geo data cube service requirements.


Standardization

In 2018, the
ISO The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries. Me ...
SQL Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel") is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
database language was extended with data cube functionality as "SQL – Part 15: Multi-dimensional arrays (SQL/MDA)".
Web Coverage Processing Service The Web Coverage Processing Service (WCPS) defines a language for filtering and processing of multi-dimensional raster coverages, such as sensor, simulation, image, and statistics data. The Web Coverage Processing Service is maintained by the Op ...
is a geo data cube analytics language issued by the
Open Geospatial Consortium The Open Geospatial Consortium (OGC) is an international voluntary consensus standards organization that develops and maintains international standards for geospatial content and location-based services, sensor web, Internet of Things, Geographi ...
in 2008. In addition to the common data cube operations, the language knows about the semantics of space and time and supports both regular and irregular grid data cubes, based on the concept of
coverage data A coverage is the digital representation of some spatio-temporal phenomenon. ISO 19123 provides the definition: * '' feature that acts as a function to return values from its range for any direct position within its spatial, temporal or spatiote ...
. An industry standard for querying business data cubes, originally developed by
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
, is
MultiDimensional eXpressions Multidimensional Expressions (MDX) is a query language for online analytical processing (OLAP) using a database management system. Much like SQL, it is a query language for OLAP cubes. It is also a calculation language, with syntax similar to spre ...
.


Implementation

Many high-level computer languages treat data cubes and other large arrays as single entities distinct from their contents. These languages, of which Fortran, APL, IDL,
NumPy NumPy (pronounced ) is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The predeces ...
, PDL, and
S-Lang The S-Lang programming library is a software library for Unix, Windows, VMS, OS/2, and Mac OS X. It provides routines for embedding an interpreter for the S-Lang scripting language, and components to facilitate the creation of text-based applica ...
are examples, allow the programmer to manipulate complete
film A film, also known as a movie or motion picture, is a work of visual art that simulates experiences and otherwise communicates ideas, stories, perceptions, emotions, or atmosphere through the use of moving images that are generally, sinc ...
clips and other data en masse with simple expressions derived from
linear algebra Linear algebra is the branch of mathematics concerning linear equations such as :a_1x_1+\cdots +a_nx_n=b, linear maps such as :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrix (mathemat ...
and
vector Vector most often refers to: * Euclidean vector, a quantity with a magnitude and a direction * Disease vector, an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematics a ...
mathematics. Some languages (such as PDL) distinguish between a
list A list is a Set (mathematics), set of discrete items of information collected and set forth in some format for utility, entertainment, or other purposes. A list may be memorialized in any number of ways, including existing only in the mind of t ...
of images and a data cube, while many (such as IDL) do not.
Array DBMS An array database management system or array DBMS provides database services specifically for arrays (also called raster data), that is: homogeneous collections of data items (often called pixels, voxels, etc.), sitting on a regular grid of one, ...
s (Database Management Systems) offer a data model which generically supports definition, management, retrieval, and manipulation of n-dimensional data cubes. This database category has been pioneered by the
rasdaman rasdaman ("raster data manager") is an Array DBMS, that is: a Database Management System which adds capabilities for storage and retrieval of massive multi-dimensional arrays, such as sensor, image, simulation, and statistics data. A frequently ...
system since 1994.


Applications

Multi-dimensional arrays can meaningfully represent spatio-temporal sensor, image, and simulation data, but also statistics data where the semantics of dimensions is not necessarily of spatial or temporal nature. Generally, any kind of axis can be combined with any other into a data cube.


Mathematics

In mathematics, a one-dimensional array corresponds to a vector, a two-dimensional array resembles a
matrix Matrix (: matrices or matrixes) or MATRIX may refer to: Science and mathematics * Matrix (mathematics), a rectangular array of numbers, symbols or expressions * Matrix (logic), part of a formula in prenex normal form * Matrix (biology), the m ...
; more generally, a
tensor In mathematics, a tensor is an algebraic object that describes a multilinear relationship between sets of algebraic objects associated with a vector space. Tensors may map between different objects such as vectors, scalars, and even other ...
may be represented as an n-dimensional data cube.


Science and engineering

For a time sequence of color images, the array is generally four-dimensional, with the dimensions representing image X and Y coordinates, time, and
RGB The RGB color model is an additive color model in which the red, green, and blue primary colors of light are added together in various ways to reproduce a broad array of colors. The name of the model comes from the initials of the three ...
(or other
color space A color space is a specific organization of colors. In combination with color profiling supported by various physical devices, it supports reproducible representations of colorwhether such representation entails an analog or a digital represe ...
) color plane. For example, the EarthServer initiative unites data centers from different continents offering 3-D x/y/t satellite image timeseries and 4-D x/y/z/t weather data for retrieval and server-side processing through the
Open Geospatial Consortium The Open Geospatial Consortium (OGC) is an international voluntary consensus standards organization that develops and maintains international standards for geospatial content and location-based services, sensor web, Internet of Things, Geographi ...
br>WCPS
geo data cube query language standard. A data cube is also used in the field of
imaging spectroscopy Imaging is the representation or reproduction of an object's form; especially a visual representation (i.e., the formation of an image). Imaging technology is the application of materials and methods to create, preserve, or duplicate images. ...
, since a spectrally-resolved image is represented as a three-dimensional volume. Earth observation data cubes combine satellite imagery such as
Landsat 8 Landsat 8 is an American Earth observation satellite launched on 11 February 2013. It is the eighth satellite in the Landsat program and the seventh to reach orbit successfully. Originally called the Landsat Data Continuity Mission (LDCM), it i ...
and
Sentinel-2 Sentinel-2 is an Earth observation mission from the Copernicus Programme that acquires optical imagery at high spatial resolution (10 m to 60 m) over land and coastal waters. The mission's Sentinel-2A and Sentinel-2B satellites were joined in or ...
with
Geographic information system A geographic information system (GIS) consists of integrated computer hardware and Geographic information system software, software that store, manage, Spatial analysis, analyze, edit, output, and Cartographic design, visualize Geographic data ...
analytics.


Business intelligence

In
online analytical processing In computing, online analytical processing (OLAP) (), is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term ''OLAP'' was created as a slight modification of the traditional database term online transaction proces ...
(OLAP), data cubes are a common arrangement of business data suitable for analysis from different perspectives through operations like slicing, dicing, pivoting, and aggregation.


See also

*
Array DBMS An array database management system or array DBMS provides database services specifically for arrays (also called raster data), that is: homogeneous collections of data items (often called pixels, voxels, etc.), sitting on a regular grid of one, ...
*
rasdaman rasdaman ("raster data manager") is an Array DBMS, that is: a Database Management System which adds capabilities for storage and retrieval of massive multi-dimensional arrays, such as sensor, image, simulation, and statistics data. A frequently ...
*
OLAP cube An OLAP cube is a multi-dimensional array of data. Online analytical processing (OLAP) is a computer-based technique of analyzing data to look for insights. The term ''cube'' here refers to a multi-dimensional dataset, which is also sometimes cal ...
*
Australian Geoscience Data Cube The Australian Geoscience Data Cube (AGDC) is an approach to storing, processing and analyzing large collections of Earth observation data. The technology is designed to meet challenges of national interest by being agile and flexible with vast amo ...
*
Graph (discrete mathematics) In discrete mathematics, particularly in graph theory, a graph is a structure consisting of a Set (mathematics), set of objects where some pairs of the objects are in some sense "related". The objects are represented by abstractions called ''Ver ...
*
Abstract semantic graph In computer science, an abstract semantic graph (ASG) or term graph is a form of abstract syntax in which an expression of a formal or programming language is represented by a graph whose vertices are the expression's subterms. An ASG is at a hig ...
* Apache Kylin


References

{{DEFAULTSORT:Data Cube Image processing Database theory