Ontology alignment, or ontology matching, is the process of determining correspondences between
concept
A concept is an abstract idea that serves as a foundation for more concrete principles, thoughts, and beliefs.
Concepts play an important role in all aspects of cognition. As such, concepts are studied within such disciplines as linguistics, ...
s in
ontologies
In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More ...
. A set of correspondences is also called an alignment. The phrase takes on a slightly different meaning, in
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
,
cognitive science
Cognitive science is the interdisciplinary, scientific study of the mind and its processes. It examines the nature, the tasks, and the functions of cognition (in a broad sense). Mental faculties of concern to cognitive scientists include percep ...
or
philosophy
Philosophy ('love of wisdom' in Ancient Greek) is a systematic study of general and fundamental questions concerning topics like existence, reason, knowledge, Value (ethics and social sciences), value, mind, and language. It is a rational an ...
.
Computer science
For
computer scientist
A computer scientist is a scientist who specializes in the academic study of computer science.
Computer scientists typically work on the theoretical side of computation. Although computer scientists can also focus their work and research on ...
s, concepts are expressed as labels for data. Historically, the need for ontology alignment arose out of the need to
integrate heterogeneous
database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
s, ones developed independently and thus each having their own data vocabulary. In the
Semantic Web
The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.
To enable the encoding o ...
context involving many actors providing their own
ontologies
In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More ...
, ontology matching has taken a critical place for helping heterogeneous resources to interoperate. Ontology alignment tools find classes of data that are
semantically equivalent, for example, "truck" and "lorry". The classes are not necessarily logically identical. According to Euzenat and Shvaiko (2007),
[Jérôme Euzenat and Pavel Shvaiko. 2013]
Ontology matching
, Springer-Verlag, 978-3-642-38720-3. there are three major dimensions for similarity: syntactic, external, and semantic. Coincidentally, they roughly correspond to the dimensions identified by Cognitive Scientists below. A number of tools and frameworks have been developed for aligning ontologies, some with inspiration from Cognitive Science and some independently.
Ontology alignment tools have generally been developed to operate on
database schema
The database schema is the structure of a database described in a formal language supported typically by a relational database management system (RDBMS). The term "wikt:schema, schema" refers to the organization of data as a blueprint of how the ...
s,
XML schema
An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constrai ...
s,
[D. Aumueller, H. Do, S. Massmann, E. Rahm. 2005]
Schema and ontology matching with COMA++
Proc. of the 2005 International Conference on Management of Data, pp. 906-908 taxonomies,
formal language
In logic, mathematics, computer science, and linguistics, a formal language is a set of strings whose symbols are taken from a set called "alphabet".
The alphabet of a formal language consists of symbols that concatenate into strings (also c ...
s,
entity-relationship models,
dictionaries
A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged Alphabetical order, alphabetically (or by Semitic root, consonantal root for Semitic languages or radical-and-stroke sorting, radical an ...
, and other label frameworks. They are usually converted to a graph representation before being matched.
Since the emergence of the Semantic Web, such graphs can be represented in the
Resource Description Framework
The Resource Description Framework (RDF) is a method to describe and exchange graph data. It was originally designed as a data model for metadata by the World Wide Web Consortium (W3C). It provides a variety of syntax notations and formats, of whi ...
line of languages by triples of the form
, as illustrated in the Notation 3
Notation3, or N3 as it is more commonly known, is a shorthand non-XML serialization of Resource Description Framework models, designed with human-readability in mind: N3 is much more compact and readable than XML RDF notation. The format is being ...
syntax.
In this context, aligning ontologies is sometimes referred to as "ontology matching".
The problem of Ontology Alignment has been tackled recently by trying to compute matching first and mapping (based on the matching) in an automatic fashion. Systems like DSSim, X-SOM or COMA++ obtained at the moment very high precision and recall
In pattern recognition, information retrieval, object detection and classification (machine learning), precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space.
Precision (also calle ...
. Th
Ontology Alignment Evaluation Initiative
aims to evaluate, compare and improve the different approaches.
Formal definition
Given two ontologies and where is the set of classes, is the set of relations, is the set of individuals, is the set of data types, and is the set of values, we can define different types of (inter-ontology) relationships. Such relationships will be called, all together, alignments and can be categorized among different dimensions:
* similarity vs logic: this is the difference between matchings (predicating about the similarity of ontology terms), and mappings (logical axiom
An axiom, postulate, or assumption is a statement (logic), statement that is taken to be truth, true, to serve as a premise or starting point for further reasoning and arguments. The word comes from the Ancient Greek word (), meaning 'that whi ...
s, typically expressing logical equivalence
In logic and mathematics, statements p and q are said to be logically equivalent if they have the same truth value in every model. The logical equivalence of p and q is sometimes expressed as p \equiv q, p :: q, \textsfpq, or p \iff q, depending ...
or inclusion among ontology terms)
* atomic vs complex: whether the alignments we considered are one-to-one, or can involve more terms in a query-like formulation (e.g., LAV/GAV mapping)
* homogeneous vs heterogeneous: do the alignments predicate on terms of the same type (e.g., classes are related only to classes, individuals to individuals, etc.) or we allow heterogeneity in the relationship?
* type of alignment: the semantics associated to an alignment. It can be subsumption, equivalence, disjointness, part-of or any user-specified relationship.
Subsumption, atomic, homogeneous alignments are the building blocks to obtain richer alignments, and have a well defined semantics in every Description Logic.
Let's now introduce more formally ontology matching and mapping.
An atomic homogeneous matching is an alignment that carries a similarity degree