HOME

TheInfoList



OR:

Automatic taxonomy construction (ATC) is the use of software programs to generate taxonomical classifications from a body of texts called a
corpus Corpus is Latin for "body". It may refer to: Linguistics * Text corpus, in linguistics, a large and structured set of texts * Speech corpus, in linguistics, a large set of speech audio files * Corpus linguistics, a branch of linguistics Music * ...
. ATC is a branch of
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
, which in turn is a branch of
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...
. A
taxonomy Taxonomy is the practice and science of categorization or classification. A taxonomy (or taxonomical classification) is a scheme of classification, especially a hierarchical classification, in which things are organized into groups or types. ...
(or taxonomical classification) is a scheme of classification, especially, a hierarchical classification, in which things are organized into groups or types. Among other things, a taxonomy can be used to organize and index knowledge (stored as documents, articles, videos, etc.), such as in the form of a library classification system, or a search engine taxonomy, so that users can more easily find the information they are searching for. Many taxonomies are
hierarchies A hierarchy (from Greek: , from , 'president of sacred rites') is an arrangement of items (objects, names, values, categories, etc.) that are represented as being "above", "below", or "at the same level as" one another. Hierarchy is an important ...
(and thus, have an intrinsic
tree structure A tree structure, tree diagram, or tree model is a way of representing the hierarchical nature of a structure in a graphical form. It is named a "tree structure" because the classic representation resembles a tree, although the chart is genera ...
), but not all are. Manually developing and maintaining a taxonomy is a labor-intensive task requiring significant time and resources, including familiarity of or expertise in the taxonomy's
domain Domain may refer to: Mathematics *Domain of a function, the set of input values for which the (total) function is defined ** Domain of definition of a partial function **Natural domain of a partial function **Domain of holomorphy of a function *Do ...
(scope, subject, or field), which drives the costs and limits the scope of such projects. Also, domain modelers have their own points of view which inevitably, even if unintentionally, work their way into the taxonomy. ATC uses artificial intelligence techniques to quickly automatically generate a taxonomy for a domain in order to avoid these problems and remove limitations.


Approaches

There are several approaches to ATC. One approach is to use rules to detect patterns in the corpus and use those patterns to infer relations such as
hyponymy In linguistics, semantics, general semantics, and ontologies, hyponymy () is a semantic relation between a hyponym denoting a subtype and a hypernym or hyperonym (sometimes called umbrella term or blanket term) denoting a supertype. In other ...
. Other approaches use
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
techniques such as Bayesian inferencing and Artificial Neural Networks.


Keyword extraction

One approach to building a taxonomy is to automatically gather the keywords from a domain using
keyword extraction Keyword extraction is tasked with the automatic identification of terms that best describe the subject of a document. ''Key phrases'', ''key terms'', ''key segments'' or just ''keywords'' are the terminology which is used for defining the terms tha ...
, then analyze the relationships between them (see
Hyponymy In linguistics, semantics, general semantics, and ontologies, hyponymy () is a semantic relation between a hyponym denoting a subtype and a hypernym or hyperonym (sometimes called umbrella term or blanket term) denoting a supertype. In other ...
, below), and then arrange them as a taxonomy based on those relationships.


Hyponymy and "is-a" relations

In ATC programs, one of the most important tasks is the discovery of hypernym and hyponym relations among words. One way to do that from a body of text is to search for certain phrases like "is a" and "such as". In
linguistics Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Lingu ...
, is-a relations are called
hyponymy In linguistics, semantics, general semantics, and ontologies, hyponymy () is a semantic relation between a hyponym denoting a subtype and a hypernym or hyperonym (sometimes called umbrella term or blanket term) denoting a supertype. In other ...
. Words that describe categories are called hypernyms and words that are examples of categories are hyponyms. For example, ''dog'' is a hypernym and ''Fido'' is one of its hyponyms. A word can be both a hyponym and a hypernym. So, ''dog'' is a hyponym of ''mammal'' and also a hypernym of ''Fido''. Taxonomies are often represented as ''is-a'' hierarchies where each level is more specific than (in mathematical language "a subset of") the level above it. For example, a basic biology taxonomy would have concepts such as ''mammal'', which is a subset of ''animal'', and ''dogs'' and ''cats'', which are subsets of ''mammal''. This kind of taxonomy is called an is-a model because the specific objects are considered instances of a concept. For example, ''Fido'' is-a instance of the concept ''dog'' and ''Fluffy'' is-a ''cat''.


Applications

ATC can be used to build taxonomies for search engines, to improve search results. ATC systems are a key component of
ontology learning Ontology learning (ontology extraction, ontology generation, or ontology acquisition) is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms and the relationships between the concepts that ...
(also known as automatic ontology construction), and have been used to automatically generate large
ontologies In computer science and information science, an ontology encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains ...
for domains such as insurance and finance. They have also been used to enhance existing large networks such as
Wordnet WordNet is a lexical database of semantic relations between words in more than 200 languages. WordNet links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into ''synsets'' with short definit ...
to make them more complete and consistent.


ATC software


Other names

Other names for automatic taxonomy construction include: * Automated outline building * Automated outline construction * Automated outline creation * Automated outline extraction * Automated outline generation * Automated outline induction * Automated outline learning * Automated outlining * Automated taxonomy building * Automated taxonomy construction * Automated taxonomy creation * Automated taxonomy extraction * Automated taxonomy generation * Automated taxonomy induction * Automated taxonomy learning * Automatic outline building * Automatic outline construction * Automatic outline creation * Automatic outline extraction * Automatic outline generation * Automatic outline induction * Automatic outline learning * Automatic taxonomy building * Automatic taxonomy creation * Automatic taxonomy extraction * Automatic taxonomy generation * Automatic taxonomy induction * Automatic taxonomy learning * Outline automation * Outline building * Outline construction * Outline creation * Outline extraction * Outline generation * Outline induction * Outline learning * Semantic taxonomy building * Semantic taxonomy construction * Semantic taxonomy creation * Semantic taxonomy extraction * Semantic taxonomy generation * Semantic taxonomy induction * Semantic taxonomy learning * Taxonomy automation * Taxonomy building * Taxonomy construction * Taxonomy creation * Taxonomy extraction * Taxonomy generation * Taxonomy induction * Taxonomy learning


See also

*
Document classification Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" (or "intellectually") ...
*
Information extraction Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. In most of the cases this activity concer ...


References

{{reflist


Further reading

*
Automatic Taxonomy Construction from Keywords
' (2012) *
Domain taxonomy learning from text: The subsumption method versus hierarchical clustering
' from ''Data & Knowledge Engineering'', Volume 83, January 2013, Pages 54–69 *
Learning taxonomic relations from a set of text documents
' *
Learning Taxonomic Relations from Heterogeneous Sources of Evidence
' *
A Metric-based Framework for Automatic Taxonomy Induction
' *
A New Method for Evaluating Automatically Learned Terminological Taxonomies
' *
Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia
' *
Structured Learning for Taxonomy Induction with Belief Propagation
' *
Taxonomy Learning Using Word Sense Induction
'


External links

*
Taxonomy 101: The Basics and Getting Started with Taxonomies
' – shows where ATC fits in to the general activity of managing taxonomies for a business enterprise in need of knowledge management. Natural language processing Taxonomy