Ontology-based Data Integration
   HOME

TheInfoList



OR:

Ontology-based data integration involves the use of one or more
ontologies In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More ...
to effectively combine data or information from multiple heterogeneous sources. It is one of the multiple
data integration Data integration refers to the process of combining, sharing, or synchronizing data from multiple sources to provide users with a unified view. There are a wide range of possible applications for data integration, from commercial (such as when a ...
approaches and may be classified as Global-As-View (GAV). The effectiveness of ontology‑based data integration is closely tied to the consistency and expressivity of the ontology used in the integration process.


Background

Data from multiple sources are characterized by multiple types of heterogeneity. The following hierarchy is often used: * Syntactic heterogeneity: is a result of differences in representation format of data * Schematic or structural heterogeneity: the native model or structure to store data differ in data sources leading to structural heterogeneity. Schematic heterogeneity that particularly appears in structured databases is also an aspect of structural heterogeneity. * Semantic heterogeneity: differences in interpretation of the 'meaning' of data are source of semantic heterogeneity * System heterogeneity: use of different
operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...
, hardware platforms lead to system heterogeneity
Ontologies In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More ...
, as formal models of representation with explicitly defined concepts and named relationships linking them, are used to address the issue of semantic heterogeneity in data sources. In domains like
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
and
biomedicine Biomedicine (also referred to as Western medicine, mainstream medicine or conventional medicine)
, the rapid development, adoption and public availability of ontologie

has made it possible for the
data integration Data integration refers to the process of combining, sharing, or synchronizing data from multiple sources to provide users with a unified view. There are a wide range of possible applications for data integration, from commercial (such as when a ...
community to leverage them for
semantic integration Semantic integration is the process of interrelating information from diverse sources, for example calendars and to do lists, email archives, presence information (physical, psychological, and social), documents of all sorts, contacts (including ...
of data and information.


The role of ontologies

Ontologies enable the unambiguous identification of entities in heterogeneous information systems and assertion of applicable named relationships that connect these entities together. Specifically, ontologies play the following roles: ;Content Explication: The ontology enables accurate interpretation of data from multiple sources through the explicit definition of terms and relationships in the ontology. ;Query Model: In some systems like SIMS, the query is formulated using the ontology as a global query schema. ;Verification: The ontology verifies the mappings used to integrate data from multiple sources. These mappings may either be user specified or generated by a system.


Approaches using ontologies for data integration

There are three main architectures that are implemented in ontology‑based data integration applications, namely, ;Single ontology approach: A single ontology is used as a global reference model in the system. This is the simplest approach as it can be simulated by other approaches. SIMS is a prominent example of this approach. The Structured Knowledge Source Integration component of Research Cyc is another prominent example of this approach. (Title = Harnessing Cyc to Answer Clinical Researchers' Ad Hoc Queries). The Gellish Taxonomic Dictionary-Ontology follows this approach as well. ;Multiple ontologies: Multiple ontologies, each modeling an individual data source, are used in combination for integration. Though, this approach is more flexible than the single ontology approach, it requires creation of mappings between the multiple ontologies. Ontology mapping is a challenging issue and is focus of large number of research efforts in
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
br>
The OBSERVER system is an example of this approach. ;Hybrid approaches: The hybrid approach involves the use of multiple ontologies that subscribe to a common, top-level vocabulary. The top-level vocabulary defines the basic terms of the domain. Thus, the hybrid approach makes it easier to use multiple ontologies for integration in presence of the common vocabulary.


See also

*
Data mapping In computing and data management, data mapping is the process of creating data element mappings between two distinct data models. Data mapping is used as a first step for a wide variety of data integration tasks, including: * Data transforma ...
* Enterprise application integration *
Enterprise information integration Enterprise information integration (EII) is the ability to support a unified view of data and information for an entire organization. In a data virtualization application of EII, a process of information integration, using data abstraction to ...
* Ontology mapping * Schema matching


Further reading

*{{cite journal , last1 = Chicco , first1 = D , last2 = Masseroli , first2 = M , year = 2016 , title = Ontology-based prediction and prioritization of gene functional annotations , journal = IEEE/ACM Transactions on Computational Biology and Bioinformatics , volume = 13 , issue = 2 , pages = 248–260 , doi=10.1109/TCBB.2015.2459694 , pmid = 27045825 , s2cid = 2795344 , url = https://doi.org/10.1109/TCBB.2015.2459694, url-access = subscription


References


External links


OBSERVER home pageCyc Semantic Knowledge Source Integration (SKSI)
Ontology (information science) Data management Data integration