MonetDB is an
open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
column-oriented relational database management system
A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970.
A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
(RDBMS) originally developed at the
Centrum Wiskunde & Informatica (CWI) in the
Netherlands
, Terminology of the Low Countries, informally Holland, is a country in Northwestern Europe, with Caribbean Netherlands, overseas territories in the Caribbean. It is the largest of the four constituent countries of the Kingdom of the Nether ...
.
It is designed to provide high performance on complex queries against large databases, such as combining
tables with hundreds of columns and millions of rows.
MonetDB has been applied in high-performance applications for
online analytical processing
In computing, online analytical processing (OLAP) (), is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term ''OLAP'' was created as a slight modification of the traditional database term online transaction proces ...
,
data mining
Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
,
geographic information system
A geographic information system (GIS) consists of integrated computer hardware and Geographic information system software, software that store, manage, Spatial analysis, analyze, edit, output, and Cartographic design, visualize Geographic data ...
(GIS),
Resource Description Framework
The Resource Description Framework (RDF) is a method to describe and exchange graph data. It was originally designed as a data model for metadata by the World Wide Web Consortium (W3C). It provides a variety of syntax notations and formats, of whi ...
(RDF),
text retrieval and
sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural biology, structural, or evolutionary relationships between ...
processing.
History
Data mining projects in the 1990s required improved analytical database support. This resulted in a
CWI spin-off called Data Distilleries, which used early MonetDB implementations in its analytical suite. Data Distilleries eventually became a subsidiary of
SPSS in 2003, which in turn was acquired by
IBM
International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
in 2009.
MonetDB in its current form was first created in 2002 by doctoral student
Peter Boncz and professor
Martin L. Kersten as part of the 1990s' MAGNUM research project at
University of Amsterdam
The University of Amsterdam (abbreviated as UvA, ) is a public university, public research university located in Amsterdam, Netherlands. Established in 1632 by municipal authorities, it is the fourth-oldest academic institution in the Netherlan ...
. It was initially called simply Monet, after the French impressionist painter
Claude Monet
Oscar-Claude Monet (, ; ; 14 November 1840 – 5 December 1926) was a French painter and founder of Impressionism painting who is seen as a key precursor to modernism, especially in his attempts to paint nature as he perceived it. During his ...
. The first version under an
open-source software
Open-source software (OSS) is Software, computer software that is released under a Open-source license, license in which the copyright holder grants users the rights to use, study, change, and Software distribution, distribute the software an ...
license (a modified version of the
Mozilla Public License
The Mozilla Public License (MPL) is a free and open-source weak copyleft license for most Mozilla Foundation software such as Firefox and Thunderbird. The MPL is developed and maintained by Mozilla, which seeks to balance the concerns of bo ...
) was released on September 30, 2004. When MonetDB version 4 was released into the open-source domain, many extensions to the code base were added by the MonetDB/CWI team, including a new SQL front end, supporting the
SQL:2003 standard.
[MonetDB historic background]
/ref>
MonetDB introduced innovations in all layers of the DBMS
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and ana ...
: a storage model based on vertical fragmentation, a modern CPU-tuned query execution architecture that often gave MonetDB a speed advantage over the same algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
over a typical interpreter-based RDBMS. It was one of the first database systems to tune query optimization for CPU cache
A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, whi ...
s. MonetDB includes automatic and self-tuning indexes, run-time query optimization, and a modular software architecture.
By 2008, a follow-on project called X100 (MonetDB/X100) started, which evolved into the VectorWise
Actian Vector (formerly known as VectorWise) is an SQL relational database management system designed for high performance in analytical database applications.
It published record breaking results on the Transaction Processing Performance Council ...
technology. VectorWise was acquired by Actian Corporation, integrated with the Ingres database and sold as a commercial product.
In 2011 a major effort to renovate the MonetDB codebase was started. As part of it, the code for the MonetDB 4 kernel and its XQuery components were frozen. In MonetDB 5, parts of the SQL layer were pushed into the kernel. The resulting changes created a difference in internal APIs, as it transitioned from MonetDB Instruction Language (MIL) to MonetDB Assembly Language (MAL). Older, no-longer maintained top-level query interfaces were also removed. First was XQuery
XQuery (XML Query) is a query language and functional programming language designed to query and transform collections of structured and unstructured data, primarily in the form of XML. It also supports text data and, through implementation-sp ...
, which relied on MonetDB 4 and was never ported to version 5. The experimental Jaql interface support was removed with the October 2014 release. With the July 2015 release, MonetDB gained support for read-only data sharding and persistent indices. In this release the deprecated streaming data module DataCell was also removed from the main codebase in an effort to streamline the code. In addition, the license has been changed into the Mozilla Public License, version 2.0.
Architecture
MonetDB architecture is represented in three layers, each with its own set of optimizers.
The front end is the top layer, providing query interface for SQL, with SciQL and SPARQL interfaces under development. Queries are parsed into domain-specific representations, like relational algebra for SQL, and optimized. The generated logical execution plans are then translated into MonetDB Assembly Language (MAL) instructions, which are passed to the next layer. The middle or back-end layer provides a number of cost-based optimizers for the MAL. The bottom layer is the database kernel, which provides access to the data stored in Binary Association Tables (BATs). Each BAT is a table consisting of an Object-identifier and value columns, representing a single column in the database.
MonetDB internal data representation also relies on the memory addressing ranges of contemporary CPUs using demand paging of memory mapped files, and thus departing from traditional DBMS designs involving complex management of large data stores in limited memory.
Query Recycling
Query recycling is an architecture for reusing the byproducts of the operator-at-a-time paradigm in a column store DBMS. Recycling makes use of the generic idea of storing and reusing the results of expensive computations. Unlike low-level instruction caches, query recycling uses an optimizer to pre-select instructions to cache. The technique is designed to improve query response times and throughput, while working in a self-organizing fashion. The authors from the CWI Database Architectures group, composed of Milena Ivanova, Martin Kersten, Niels Nes and Romulo Goncalves, won the "Best Paper Runner Up" at the ACM SIGMOD 2009 conference for their work on Query Recycling.[
]
Database Cracking
MonetDB was one of the first databases to introduce Database Cracking. Database Cracking is an incremental partial indexing and/or sorting of the data. It directly exploits the columnar nature of MonetDB. Cracking is a technique that shifts the cost of index maintenance from updates to query processing. The query pipeline optimizers are used to massage the query plans to crack and to propagate this information. The technique allows for improved access times and self-organized behavior. Database Cracking received the ACM SIGMOD 2011 J.Gray best dissertation award.
Components
A number of extensions exist for MonetDB that extend the functionality of the database engine. Due to the three-layer architecture, top-level query interfaces can benefit from optimizations done in the backend and kernel layers.
SQL
MonetDB/SQL is a top-level extension, which provides complete support for transactions in compliance with the SQL:2003 standard.
GIS
MonetDB/GIS is an extension to MonetDB/SQL with support for the Simple Features Access standard of Open Geospatial Consortium
The Open Geospatial Consortium (OGC) is an international voluntary consensus standards organization that develops and maintains international standards for geospatial content and location-based services, sensor web, Internet of Things, Geographi ...
(OGC).
SciQL
SciQL an SQL-based query language for science applications with arrays as first class citizens. SciQL allows MonetDB to effectively function as an array database. SciQL is used in the European Union
The European Union (EU) is a supranational union, supranational political union, political and economic union of Member state of the European Union, member states that are Geography of the European Union, located primarily in Europe. The u ...
br>PlanetData
an
TELEIOS
project, together with the Data Vault technology, providing transparent access to large scientific data repositories. Data Vaults map the data from the distributed repositories to SciQL arrays, allowing for improved handling of spatio-temporal data in MonetDB.[
] SciQL will be further extended for the Human Brain Project.
Data Vaults
Data Vault is a database-attached external file repository for MonetDB, similar to the SQL/MED standard. The Data Vault technology allows for transparent integration with distributed/remote file repositories. It is designed for scientific data data exploration and mining
Mining is the Resource extraction, extraction of valuable geological materials and minerals from the surface of the Earth. Mining is required to obtain most materials that cannot be grown through agriculture, agricultural processes, or feasib ...
, specifically for remote sensing
Remote sensing is the acquisition of information about an physical object, object or phenomenon without making physical contact with the object, in contrast to in situ or on-site observation. The term is applied especially to acquiring inform ...
data. There is support for the GeoTIFF (Earth observation
Earth observation (EO) is the gathering of information about the physical, chemical, and biosphere, biological systems of the planet Earth. It can be performed via remote sensing, remote-sensing technologies (Earth observation satellites) or throu ...
), FITS
Flexible Image Transport System (FITS) is an open standard defining a digital file format used for storage, transmission and processing of data: formatted as multi-dimensional arrays (for example a 2D image), or tables. FITS is the most commonl ...
(astronomy
Astronomy is a natural science that studies celestial objects and the phenomena that occur in the cosmos. It uses mathematics, physics, and chemistry in order to explain their origin and their overall evolution. Objects of interest includ ...
), MiniSEED (seismology
Seismology (; from Ancient Greek σεισμός (''seismós'') meaning "earthquake" and -λογία (''-logía'') meaning "study of") is the scientific study of earthquakes (or generally, quakes) and the generation and propagation of elastic ...
) and NetCDF
NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The project homepage is hosted by the Unidat ...
formats.[
]
The data is stored in the file repository in the original format, and loaded in the database in a lazy fashion, only when needed. The system can also process the data upon ingestion, if the data format requires it.
As a result, even very large file repositories can be efficiently analyzed, as only the required data is processed in the database. The data can be accessed through either the MonetDB SQL or SciQL interfaces. The Data Vault technology was used in the European Union
The European Union (EU) is a supranational union, supranational political union, political and economic union of Member state of the European Union, member states that are Geography of the European Union, located primarily in Europe. The u ...
'
TELEIOS
project, which was aimed at building a virtual observatory for Earth observation data. Data Vaults for FITS files have also been used for processing astronomical survey
An astronomical survey is a general celestial cartography, map or astrophotography, image of a region of the sky (or of the whole sky) that lacks a specific observational target. Alternatively, an astronomical survey may comprise a set of image ...
data for The INT Photometric H-Alpha Survey (IPHAS)
SAM/BAM
MonetDB has a SAM/BAM module for efficient processing of sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural biology, structural, or evolutionary relationships between ...
data. Aimed at the bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
research, the module has a SAM/BAM data loader and a set of SQL UDFs for working with DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
data. The module uses the popular SAMtools library.
RDF/SPARQL
MonetDB/RDF is a SPARQL-based extension for working with linked data, which adds support for RDF and allowing MonetDB to function as a triplestore
A triplestore or RDF store is a purpose-built database for the storage and retrieval of triples through semantic queries. A triple is a data entity composed of subject– predicate– object, like "Bob is 35" (i.e., Bob's age measured in years i ...
. Under development for the Linked Open Data 2 project.
R integration
MonetDB/R module allows for UDFs written in R to be executed in the SQL layer of the system. This is done using the native R support for running embedded in another application, inside the RDBMS in this case. Previously the MonetDB.R connector allowed the using MonetDB data sources and process them in an R session. The newer R integration feature of MonetDB does not require data to be transferred between the RDBMS and the R session, reducing overhead and improving performance. The feature is intended to give users access to functions of the R statistical software for in-line analysis of data stored in the RDBMS. It complements the existing support for C UDFs and is intended to be used for in-database processing.
Python integration
Similarly to the embedded R UDFs in MonetDB, the database now has support for UDFs written in Python/ NumPy. The implementation uses Numpy arrays (themselves Python wrappers for C arrays), as a result there is limited overhead - providing a functional Python integration with speed matching native SQL functions. The Embedded Python functions also support mapped operations, allowing user to execute Python functions in parallel within SQL queries. The practical side of the feature gives users access to Python/NumPy/SciPy
SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing.
SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, fast Fourier ...
libraries, which can provide a large selection of statistical/analytical functions.
MonetDB embedded
Following the release of an embedded driver for R and R UDFs in MonetDB (MonetDB/R), the authors created an embedded version of MonetDB in R called MonetDBLite, embedded versions for Python and Java followed. They are distributed as embeddable packages, removing the need to manage a database server, required for the previous API integrations. The DBMS runs within the process itself, eliminating socket communication and serialisation overhead - greatly improving efficiency. The idea behind it is to easily embed an SQLite-like package with the performance of an in-memory optimized columnar store.
Former extensions
A number of former extensions have been deprecated and removed from the stable code base over time. Some notable examples include an XQuery
XQuery (XML Query) is a query language and functional programming language designed to query and transform collections of structured and unstructured data, primarily in the form of XML. It also supports text data and, through implementation-sp ...
extension removed in MonetDB version 5; a JAQL extension, and a streaming data extension called ''Data Cell''.
MonetDB Foundation
The MonetDB Foundation is the independent non-profit organisation behind MonetDB. The foundation holds the intellectual property (IP) of MonetDB and is dedicated to advance the development and long-term maintenance of MonetDB. The foundation is funded by charitable donations.
See also
* List of relational database management systems
This is a list of relational database management systems.
List of software
Front-end User interfaces Only
* Apache OpenOffice Base
** HSQLDB
* LibreOffice Base
** Firebird
** HSQLDB
*Microsoft Access
** Access Database Engine
Discontinued
* Bri ...
* Comparison of relational database management systems
The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are ba ...
* Database management system
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and an ...
* Column-oriented DBMS
* Array DBMS
References
Bibliography
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
External links
Official homepage of MonetDB
MonetDB Solutions - MonetDB's professional services company
Database Architectures group at CWI - the original developers of MonetDB
List of scientific projects using MonetDB
MonetDB.R - MonetDB to R Connector
{{Data warehouse
Big data products
Client-server database management systems
Column-oriented DBMS software for Linux
Cross-platform free software
Cross-platform software
Data warehousing products
Database engines
Free database management systems
Free software programmed in C
Products introduced in 2004
Relational database management systems
Structured storage