
A dataspace is an abstraction in
data management
Data management comprises all disciplines related to handling data as a valuable resource, it is the practice of managing an organization's data so it can be analyzed for decision making.
Concept
The concept of data management emerged alongsi ...
that aims to overcome some of the problems encountered in a
data integration
Data integration refers to the process of combining, sharing, or synchronizing data from multiple sources to provide users with a unified view.
There are a wide range of possible applications for data integration, from commercial (such as when a ...
system. A dataspace is defined as a set of "participants", or data sources, and the relations between them: for example that dataset A is a duplicate of dataset B.
It can contain all data sources of an organization regardless of their format, physical location, or
data model
A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be ...
.
The data space then provides a unified interface to query data regardless of format, sometimes in a "best-effort" fashion, and ways to further integrate the data when necessary.
It is very different than a traditional
relational database
A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970.
A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
, which requires that all data be in the same format.
The aim of the concept is to reduce the effort required to set up a data integration system by relying on existing matching and mapping generation techniques, and to improve the system in "pay-as-you-go" fashion as it is used. Labor-intensive aspects of data integration are postponed until they are absolutely needed.
Traditionally, data integration and
data exchange
Data exchange is the process of taking data structured under a ''source'' schema and transforming it into a ''target'' schema, so that the target data is an accurate representation of the source data. Data exchange allows data to be shared between ...
systems have aimed to offer many of the purported services of dataspace systems. Dataspaces can be viewed as a next step in the evolution of data integration architectures, but are distinct from current data integration systems because they require
semantic integration before any services can be provided. Hence, although there is not a single
schema
Schema may refer to:
Science and technology
* SCHEMA (bioinformatics), an algorithm used in protein engineering
* Schema (genetic algorithms), a set of programs or bit strings that have some genotypic similarity
* Schema.org, a web markup vocab ...
to which all the data conforms and the data resides in a multitude of host systems, the data integration system knows the precise relationships between the terms used in each schema. As a result, significant up-front effort is required in order to set up a data integration system.
Dataspaces shift the emphasis to a data co-existence approach providing base functionality over all data sources, regardless of how integrated they are. For example, a DataSpace Support Platform (DSSP) can provide
keyword search over all of its data sources, similar to that provided by existing desktop search systems. When more sophisticated operations are required, such as relational-style queries,
data mining
Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
, or monitoring over certain sources, then additional effort can be applied to more closely integrate those sources in an incremental fashion. Similarly, in terms of traditional database guarantees, initially a dataspace system can only provide weaker guarantees of consistency and durability. As stronger guarantees are desired, more effort can be put into making agreements among the various owners of data sources, and opening up certain interfaces (e.g., for commit protocols).
History
According to a cyclic model of technology development, new technologies progress by first going through a phase of design competition, where the technology is explored and experiments are done, until the industry settles upon a dominant design and ceases to iterate so much.
, Edward describes dataspaces having already undergone a "first wave" of adoption, composed of exploratory and proof-of-concept projects, and have begun a "second wave" in which they are being adapted for more general and less nice use cases.
The
European Commission
The European Commission (EC) is the primary Executive (government), executive arm of the European Union (EU). It operates as a cabinet government, with a number of European Commissioner, members of the Commission (directorial system, informall ...
has been working on the development of shared dataspaces for various industries called "Common European Data Spaces" since February 2020.
Dataspaces are planned for the agriculture, energy, finance, health, media, manufacturing, mobility, and tourism industries as well as for the
European Green Deal, languages, public administration, research and innovation, and skills.
The first concrete steps taken were a number of research and innovation initiatives funded as part of the European Public-Private Partnership on Big Data Value (Big Data Value PPP).
See also
*
Data integration
Data integration refers to the process of combining, sharing, or synchronizing data from multiple sources to provide users with a unified view.
There are a wide range of possible applications for data integration, from commercial (such as when a ...
*
Data mapping
In computing and data management, data mapping is the process of creating data element mappings between two distinct data models. Data mapping is used as a first step for a wide variety of data integration tasks, including:
* Data transforma ...
*
Information integration
*
Linked data
In computing, linked data is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web ...
*
Semantic integration
*
Semantic query
References
Further reading
* Partha Pratim Talukdar, Marie Jacob, Muhammad Salman Mehmood, Koby Crammer, Zachary G. Ives, Fernando Pereira, Sudipto Guha: Learning to create data-integrating queries. PVLDB 1(1): 785-796 (2008)
* Michael J. Franklin, Alon Y. Halevy, David Maier
A first tutorial on dataspaces PVLDB 1(2): 1516-1517 (2008)
* Jens-Peter Dittrich, Marcos Antonio Vaz Salles
iDM: A Unified and Versatile Data Model for Personal Dataspace Management VLDB 2006: 367-378.
{{Semantic Web
Information systems
Information technology management