DiShIn
   HOME

TheInfoList



OR:

As described in and DiShIn (Disjunctive Shared Information) is method to calculate that shared information content by complementing the value of most informative common ancestor (MICA) with their disjunctive ancestors by exploring the
multiple inheritance Multiple inheritance is a feature of some object-oriented computer programming languages in which an object or class can inherit features from more than one parent object or parent class. It is distinct from single inheritance, where an object ...
of an ontology. The shared
information content In information theory, the information content, self-information, surprisal, or Shannon information is a basic quantity derived from the probability of a particular event occurring from a random variable. It can be thought of as an alternative w ...
of two terms in an
Ontology (information science) In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More ...
is a popular technique to measure their
semantic similarity Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tool ...
. DiShIn re-defines the shared information content between two concepts as the average of all their disjunctive ancestors, assuming that an ancestor is disjunctive if the difference between the number of distinct paths from the concepts to it is different from that of any other more informative ancestor. In other words, a disjunctive ancestor is the most informative ancestor representing a given set of parallel interpretations. DiShIn is an improvement of GraSM in terms of computational efficiency and in the management of parallel interpretations.


Example

For example,
palladium Palladium is a chemical element; it has symbol Pd and atomic number 46. It is a rare and lustrous silvery-white metal discovered in 1802 by the English chemist William Hyde Wollaston. He named it after the asteroid Pallas (formally 2 Pallas), ...
,
platinum Platinum is a chemical element; it has Symbol (chemistry), symbol Pt and atomic number 78. It is a density, dense, malleable, ductility, ductile, highly unreactive, precious metal, precious, silverish-white transition metal. Its name origina ...
,
silver Silver is a chemical element; it has Symbol (chemistry), symbol Ag () and atomic number 47. A soft, whitish-gray, lustrous transition metal, it exhibits the highest electrical conductivity, thermal conductivity, and reflectivity of any metal. ...
and
gold Gold is a chemical element; it has chemical symbol Au (from Latin ) and atomic number 79. In its pure form, it is a brightness, bright, slightly orange-yellow, dense, soft, malleable, and ductile metal. Chemically, gold is a transition metal ...
are considered to be
precious metals Precious metals are rare, naturally occurring metallic chemical elements of high economic value. Precious metals, particularly the noble metals, are more corrosion resistant and less chemically reactive than most elements. They are usual ...
, and
silver Silver is a chemical element; it has Symbol (chemistry), symbol Ag () and atomic number 47. A soft, whitish-gray, lustrous transition metal, it exhibits the highest electrical conductivity, thermal conductivity, and reflectivity of any metal. ...
,
gold Gold is a chemical element; it has chemical symbol Au (from Latin ) and atomic number 79. In its pure form, it is a brightness, bright, slightly orange-yellow, dense, soft, malleable, and ductile metal. Chemically, gold is a transition metal ...
and
copper Copper is a chemical element; it has symbol Cu (from Latin ) and atomic number 29. It is a soft, malleable, and ductile metal with very high thermal and electrical conductivity. A freshly exposed surface of pure copper has a pinkish-orang ...
considered to be
coinage metals The coinage metals comprise those metallic chemical elements and alloys which have been used to mint coins. Historically, most coinage metals are from the three nonradioactive members of group 11 of the periodic table: copper, silver and gold. ...
. Thus, we have: metal / \ precious coinage / , \ \ / / \ / , \ gold / \ palladium platinum silver copper When calculating the
semantic similarity Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tool ...
between ''platinum'' and ''gold'', DiShIn starts by calculating the number of paths difference for all their common ancestors: gold -> coinage -> metal gold -> precious -> metal platinum -> precious -> metal gold -> precious platinum -> precious For ''metal'' we have two paths from ''gold'' and one from ''platinum'', so we have a path difference of one. For ''precious'' we have one path from each concept, so we have a path difference of zero. Since their path difference is distinct, both common ancestors ''metal'' and ''precious'' are considered to be disjunctive common ancestors. When calculating the
semantic similarity Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tool ...
between ''platinum'' and ''palladium'', DiShIn starts by calculating the number of paths difference for all their common ancestors: palladium -> precious -> metal platinum -> precious -> metal palladium -> precious platinum -> precious For both ''metal'' and ''precious'', we have only one path from each concept, so we have a path difference of zero for both common ancestors. Thus, only the common ancestor ''precious'' (the most informative) is considered to be a disjunctive common ancestor. Given that node-based semantic similarity measures are proportional to the average of the
information content In information theory, the information content, self-information, surprisal, or Shannon information is a basic quantity derived from the probability of a particular event occurring from a random variable. It can be thought of as an alternative w ...
of their common disjunctive ancestors: ''metal'' and ''precious'' in case of ''platinum'' and ''gold''; and ''precious'' in case of ''platinum'' and ''palladium'', means that for DiShIn ''palladium'' and ''platinum'' are more similar than ''platinum'' and ''gold''. When calculating the
semantic similarity Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tool ...
between ''silver'' and ''gold'', DiShIn starts by calculating the number of paths difference for all their common ancestors: gold -> coinage -> metal gold -> precious -> metal silver -> coinage -> metal silver -> precious -> metal gold -> precious silver -> precious gold -> coinage silver -> coinage As in the case of ''platinum'' and ''palladium'', here all common ancestors have a path difference of zero, since ''silver'' and ''gold'' share the same relationships and therefore have parallel interpretations. Thus, only the most informative common ancestor ''precious'' or ''coinage'' is considered to be a disjunctive common ancestor. This means that for DiShIn the similarity between ''silver'' and ''gold'' is greater or equal than the similarity between any other pair of the leaf concepts. Thus, DiShIn does not penalize parallel interpretations as GraSM did.


References

{{Reflist Computational linguistics Statistical distance