HOME

TheInfoList



OR:

MonetDB is an
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
column-oriented
relational database management system A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
(RDBMS) originally developed at the Centrum Wiskunde & Informatica (CWI) in the
Netherlands ) , anthem = ( en, "William of Nassau") , image_map = , map_caption = , subdivision_type = Sovereign state , subdivision_name = Kingdom of the Netherlands , established_title = Before independence , established_date = Spanish Neth ...
. It is designed to provide high performance on complex queries against large databases, such as combining
tables Table may refer to: * Table (furniture), a piece of furniture with a flat surface and one or more legs * Table (landform), a flat area of land * Table (information), a data arrangement with rows and columns * Table (database), how the table da ...
with hundreds of columns and millions of rows. MonetDB has been applied in high-performance applications for
online analytical processing Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, rep ...
, data mining,
geographic information system A geographic information system (GIS) is a type of database containing geographic data (that is, descriptions of phenomena for which location is relevant), combined with software tools for managing, analyzing, and visualizing those data. In a ...
(GIS),
Resource Description Framework The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of ...
(RDF), text retrieval and
sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Ali ...
processing.


History

Data mining projects in the 1990s required improved analytical database support. This resulted in a CWI spin-off called Data Distilleries, which used early MonetDB implementations in its analytical suite. Data Distilleries eventually became a subsidiary of
SPSS SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. C ...
in 2003, which in turn was acquired by IBM in 2009. MonetDB in its current form was first created in 2002 by doctoral student
Peter Boncz Peter Boncz is a Dutch computer scientist specializing in database systems. He is a researcher at the Centrum Wiskunde & Informatica and professor at the Vrije Universiteit Amsterdam in the special chair of Large-Scale Analytical Data Management ...
and professor Martin L. Kersten as part of the 1990s' MAGNUM research project at
University of Amsterdam The University of Amsterdam (abbreviated as UvA, nl, Universiteit van Amsterdam) is a public research university located in Amsterdam, Netherlands. The UvA is one of two large, publicly funded research universities in the city, the other bein ...
. It was initially called simply Monet, after the French impressionist painter
Claude Monet Oscar-Claude Monet (, , ; 14 November 1840 – 5 December 1926) was a French painter and founder of impressionist painting who is seen as a key precursor to modernism, especially in his attempts to paint nature as he perceived it. During ...
. The first version under an
open-source software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Open ...
license (a modified version of the
Mozilla Public License The Mozilla Public License (MPL) is a free and open-source weak copyleft license for most Mozilla Foundation software such as Firefox and Thunderbird The MPL license is developed and maintained by Mozilla, which seeks to balance the concerns ...
) was released on September 30, 2004. When MonetDB version 4 was released into the open-source domain, many extensions to the code base were added by the MonetDB/CWI team, including a new SQL front end, supporting the SQL:2003 standard.MonetDB historic background
/ref> MonetDB introduced innovations in all layers of the
DBMS In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...
: a storage model based on vertical fragmentation, a modern
CPU A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and ...
-tuned query execution architecture that often gave MonetDB a speed advantage over the same
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
over a typical interpreter-based
RDBMS A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
. It was one of the first database systems to tune query optimization for
CPU cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which ...
s. MonetDB includes automatic and self-tuning indexes, run-time query optimization, and a modular software architecture. By 2008, a follow-on project called X100 (MonetDB/X100) started, which evolved into the
VectorWise Actian Vector (formerly known as VectorWise) is an SQL relational database management system designed for high performance in analytical database applications. It published record breaking results on the Transaction Processing Performance Counc ...
technology. VectorWise was acquired by
Actian Corporation Actian is a computer software company headquartered in Sunnyvale, California that provides data management software. In July 2018, Actian was acquired by HCL Technologies and Sumeru Equity Partners for $330 million. On December 31, 2021, HCL Techn ...
, integrated with the Ingres database and sold as a commercial product. In 2011 a major effort to renovate the MonetDB codebase was started. As part of it, the code for the MonetDB 4 kernel and its XQuery components were frozen. In MonetDB 5, parts of the SQL layer were pushed into the kernel. The resulting changes created a difference in internal APIs, as it transitioned from MonetDB Instruction Language (MIL) to MonetDB Assembly Language (MAL). Older, no-longer maintained top-level query interfaces were also removed. First was
XQuery XQuery (XML Query) is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats ( JSON, bi ...
, which relied on MonetDB 4 and was never ported to version 5. The experimental
Jaql Jaql (pronounced "jackal") is a functional data processing and query language most commonly used for JSON query processing on big data. It started as an open source project at Google but the latest release was on 2010-07-12. IBM took it over as ...
interface support was removed with the October 2014 release. With the July 2015 release, MonetDB gained support for read-only data sharding and persistent indices. In this release the deprecated streaming data module DataCell was also removed from the main codebase in an effort to streamline the code. In addition, the license has been changed into the Mozilla Public License, version 2.0.


Architecture

MonetDB architecture is represented in three layers, each with its own set of optimizers. The front end is the top layer, providing query interface for SQL, with SciQL and
SPARQL SPARQL (pronounced "sparkle" , a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description F ...
interfaces under development. Queries are parsed into domain-specific representations, like relational algebra for SQL, and optimized. The generated logical execution plans are then translated into MonetDB Assembly Language (MAL) instructions, which are passed to the next layer. The middle or back-end layer provides a number of cost-based optimizers for the MAL. The bottom layer is the database kernel, which provides access to the data stored in Binary Association Tables (BATs). Each BAT is a table consisting of an Object-identifier and value columns, representing a single column in the database. MonetDB internal data representation also relies on the memory addressing ranges of contemporary CPUs using
demand paging In computer operating systems, demand paging (as opposed to anticipatory paging) is a method of virtual memory management. In a system that uses demand paging, the operating system copies a disk page into physical memory only if an attempt is ma ...
of memory mapped files, and thus departing from traditional DBMS designs involving complex management of large data stores in limited memory.


Query Recycling

Query recycling is an architecture for reusing the byproducts of the operator-at-a-time paradigm in a column store DBMS. Recycling makes use of the generic idea of storing and reusing the results of expensive computations. Unlike low-level instruction caches, query recycling uses an optimizer to pre-select instructions to cache. The technique is designed to improve query response times and throughput, while working in a self-organizing fashion. The authors from the CWI Database Architectures group, composed of Milena Ivanova, Martin Kersten, Niels Nes and Romulo Goncalves, won the "Best Paper Runner Up" at the ACM SIGMOD 2009 conference for their work on Query Recycling.


Database Cracking

MonetDB was one of the first databases to introduce Database Cracking. Database Cracking is an incremental partial indexing and/or sorting of the data. It directly exploits the columnar nature of MonetDB. Cracking is a technique that shifts the cost of index maintenance from updates to query processing. The query pipeline optimizers are used to massage the query plans to crack and to propagate this information. The technique allows for improved access times and self-organized behavior. Database Cracking received the ACM SIGMOD 2011 J.Gray best dissertation award.


Components

A number of extensions exist for MonetDB that extend the functionality of the database engine. Due to the three-layer architecture, top-level query interfaces can benefit from optimizations done in the backend and kernel layers.


SQL

MonetDB/SQL is a top-level extension, which provides complete support for transactions in compliance with the SQL:2003 standard.


GIS

MonetDB/GIS is an extension to MonetDB/SQL with support for the Simple Features Access standard of
Open Geospatial Consortium The Open Geospatial Consortium (OGC), an international voluntary consensus standards organization for geospatial content and location-based services, sensor web and Internet of Things, GIS data processing and data sharing. It originated in 1994 ...
(OGC).


SciQL

SciQL an SQL-based query language for science applications with arrays as first class citizens. SciQL allows MonetDB to effectively function as an array database. SciQL is used in the
European Union The European Union (EU) is a supranational political and economic union of member states that are located primarily in Europe. The union has a total area of and an estimated total population of about 447million. The EU has often been ...
br>PlanetData
an
TELEIOS
project, together with the Data Vault technology, providing transparent access to large scientific data repositories. Data Vaults map the data from the distributed repositories to SciQL arrays, allowing for improved handling of spatio-temporal data in MonetDB. SciQL will be further extended for the
Human Brain Project The Human Brain Project (HBP) is a large ten-year scientific research project, based on exascale supercomputers, that aims to build a collaborative ICT-based scientific research infrastructure to allow researchers across Europe to advance knowl ...
.


Data Vaults

Data Vault is a database-attached external file repository for MonetDB, similar to the SQL/MED standard. The Data Vault technology allows for transparent integration with distributed/remote file repositories. It is designed for scientific data data exploration and
mining Mining is the extraction of valuable minerals or other geological materials from the Earth, usually from an ore body, lode, vein, seam, reef, or placer deposit. The exploitation of these deposits for raw material is based on the economic via ...
, specifically for
remote sensing Remote sensing is the acquisition of information about an object or phenomenon without making physical contact with the object, in contrast to in situ or on-site observation. The term is applied especially to acquiring information about Earth ...
data. There is support for the
GeoTIFF GeoTIFF is a public domain metadata standard which allows georeferencing information to be embedded within a TIFF file. The potential additional information includes map projection, coordinate systems, ellipsoids, datums, and everything else necess ...
(
Earth observation Earth observation (EO) is the gathering of information about the physical, chemical, and biological systems of the planet Earth. It can be performed via remote-sensing technologies (Earth observation satellites) or through direct-contact sensors ...
),
FITS Flexible Image Transport System (FITS) is an open standard defining a digital file format useful for storage, transmission and processing of data: formatted as multi-dimensional arrays (for example a 2D image), or tables. FITS is the most commo ...
(
astronomy Astronomy () is a natural science that studies celestial objects and phenomena. It uses mathematics, physics, and chemistry in order to explain their origin and evolution. Objects of interest include planets, moons, stars, nebulae, galax ...
), MiniSEED (
seismology Seismology (; from Ancient Greek σεισμός (''seismós'') meaning "earthquake" and -λογία (''-logía'') meaning "study of") is the scientific study of earthquakes and the propagation of elastic waves through the Earth or through other ...
) and
NetCDF NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The project homepage is hosted by the Unidata ...
formats. The data is stored in the file repository in the original format, and loaded in the database in a lazy fashion, only when needed. The system can also process the data upon ingestion, if the data format requires it. As a result, even very large file repositories can be efficiently analyzed, as only the required data is processed in the database. The data can be accessed through either the MonetDB SQL or SciQL interfaces. The Data Vault technology was used in the
European Union The European Union (EU) is a supranational political and economic union of member states that are located primarily in Europe. The union has a total area of and an estimated total population of about 447million. The EU has often been ...
'
TELEIOS
project, which was aimed at building a virtual observatory for Earth observation data. Data Vaults for FITS files have also been used for processing
astronomical survey An astronomical survey is a general map or image of a region of the sky (or of the whole sky) that lacks a specific observational target. Alternatively, an astronomical survey may comprise a set of images, spectra, or other observations of obj ...
data for The INT Photometric H-Alpha Survey (IPHAS)


SAM/BAM

MonetDB has a SAM/BAM module for efficient processing of
sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Ali ...
data. Aimed at the
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combine ...
research, the module has a SAM/BAM data loader and a set of SQL UDFs for working with DNA data. The module uses the popular SAMtools library.


RDF/SPARQL

MonetDB/RDF is a
SPARQL SPARQL (pronounced "sparkle" , a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description F ...
-based extension for working with linked data, which adds support for RDF and allowing MonetDB to function as a
triplestore A triplestore or RDF store is a purpose-built database for the storage and retrieval of triples through semantic queries. A triple is a data entity composed of subject–predicate–object, like "Bob is 35" or "Bob knows Fred". Much like a relat ...
. Under development for the Linked Open Data 2 project.


R integration

MonetDB/R module allows for UDFs written in R to be executed in the SQL layer of the system. This is done using the native R support for running embedded in another application, inside the RDBMS in this case. Previously the MonetDB.R connector allowed the using MonetDB data sources and process them in an R session. The newer R integration feature of MonetDB does not require data to be transferred between the RDBMS and the R session, reducing overhead and improving performance. The feature is intended to give users access to functions of the R statistical software for in-line analysis of data stored in the RDBMS. It complements the existing support for C UDFs and is intended to be used for
in-database processing In-database processing, sometimes referred to as in-database analytics, refers to the integration of data analytics into data warehousing functionality. Today, many large databases, such as those used for credit card fraud fraud detection, detection ...
.


Python integration

Similarly to the embedded R UDFs in MonetDB, the database now has support for UDFs written in
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pr ...
/ NumPy. The implementation uses Numpy arrays (themselves Python wrappers for C arrays), as a result there is limited overhead - providing a functional Python integration with speed matching native SQL functions. The Embedded Python functions also support mapped operations, allowing user to execute Python functions in parallel within SQL queries. The practical side of the feature gives users access to Python/NumPy/
SciPy SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signa ...
libraries, which can provide a large selection of statistical/analytical functions.


MonetDBLite

Following the release of remote driver for R ( MonetDB.R) and R UDFs in MonetDB (MonetDB/R), the authors created an embedded version of MonetDB in R called MonetDBLite. It is distributed as an R package, removing the need to manage a database server, required for the previous R integrations. The DBMS runs within the R process itself, eliminating socket communication and serialisation overhead - greatly improving efficiency. The idea behind it is to deliver an
SQLite SQLite (, ) is a database engine written in the C programming language. It is not a standalone app; rather, it is a library that software developers embed in their apps. As such, it belongs to the family of embedded databases. It is the most ...
-like package for R, with the performance of an in-memory optimized columnar store.


Former extensions

A number of former extensions have been deprecated and removed from the stable code base over time. Some notable examples include an
XQuery XQuery (XML Query) is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats ( JSON, bi ...
extension removed in MonetDB version 5; a
JAQL Jaql (pronounced "jackal") is a functional data processing and query language most commonly used for JSON query processing on big data. It started as an open source project at Google but the latest release was on 2010-07-12. IBM took it over as ...
extension, and a
streaming data Streaming data is data that is continuously generated by different sources. Such data should be processed incrementally using stream processing techniques without having access to all of the data. In addition, it should be considered that concept ...
extension called ''Data Cell''.


See also

*
List of relational database management systems This is a list of relational database management systems. List of software * 4th Dimension * Access Database Engine (formerly known as Jet Database Engine) * Adabas D * Airtable *Apache Derby * Apache Ignite * Aster Data *Amazon Aurora * Altibase ...
*
Comparison of relational database management systems The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are ba ...
*
Database management system In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
*
Column-oriented DBMS A column-oriented DBMS or columnar DBMS is a database management system (DBMS) that stores data tables by column rather than by row. Benefits include more efficient access to data when only querying a subset of columns (by eliminating the need to r ...
*
Array DBMS Array database management systems (array DBMSs) provide database services specifically for arrays (also called raster data), that is: homogeneous collections of data items (often called pixels, voxels, etc.), sitting on a regular grid of one, two, ...


References


Bibliography

* * * * * * * * * * * * * * * * *


External links


Official homepage of MonetDB

MonetDB Solutions - MonetDB's professional services company

Database Architectures group at CWI - the original developers of MonetDB

List of scientific projects using MonetDB

MonetDB.R - MonetDB to R Connector
{{Data warehouse Big data products Client-server database management systems Column-oriented DBMS software for Linux Cross-platform free software Cross-platform software Data warehousing products Database engines Free database management systems Free software programmed in C Products introduced in 2004 Relational database management systems Structured storage